Shared Task on Protest Event Mining
Go directly to contents: Organising committee – Shared task description – Annotated corpus
Social scientists often work with structured event data that they extract from news documents. Data are gathered manually or with the help of traditional NLP technology like pattern matching with hand-crafted patterns or statistical classification over bag-of-words document models. The goal of this workshop and shared task is to push forward the state of the art in automated event data extraction.
In this shared task, we invite computational social scientists and NLP researchers to tackle the problem of event data extraction as it arises in one domain of interest to political scientists – that of public protest. This domain includes events like strikes, demonstrations, riots, terrorist attacks, on-line campaigns, and symbolic protest actions. Our goal is to learn to which extent state-of-the-art NLP technology can facilitate event data extraction. We strive to make this task attractive to NLP researchers working on a wide range of information extraction problems. The task builds on classical event extraction and then goes into cross-document identification of co-referring event mentions, chains and networks of events. We are interested in learning about protest forms, actors, locations and times, issues and intensity, as well as the evolution of protest stories in time, protest triggers and targets.
The output from this competition will be of immediate interest to social movement scholars, who rely on protest event data as a means of determining causes and consequences of mobilization, political scientists interested in political instability, and journalists in the business of data journalism.
×××
This project has started out as a joint effort between the POLCON project of Prof. Hanspeter Kriesi and the political scientists at Wisconsin-Madison led by Prof. Pamela Oliver. The organising committee is comprised of:
Hanspeter Kriesi is a comparative political scientist, holder of the Stein Rokkan Chair in Comparative Politics at the European University Institute, the director of the POLCON project (European Research Council project No. 338875). He has previously taught at the universities of Amsterdam, Geneva, and Zurich. In 2005-2012, he was the director of a Swiss national research programme on the “Challenges to democracy in the 21st century”.
Research interests and areas of expertise
His research currently focuses on the study of the political consequences of the Great Recession in electoral and protest terms in Europe: How have the political dynamics been evolving in different European countries as a result of the extent of the national crisis, the international constraints, the government actions and the reactions of the challengers? How tight is the coupling of mobilization in the electoral and the protest arena? How do the dynamics vary between debtor and creditor nations, and between Western and Central-Eastern European countries? To study these dynamics, he would like to combine advanced protest event analysis with the analysis of electoral campaigns and decision-making in 12 European countries.
He is experienced in classical protest event analysis based on manually annotated newspaper data, as well as the analysis of electoral campaigns based on core sentence analysis of newspapers, and the analysis of macro-electoral data and electoral survey data.
Jasmine Lorenzini is a post-doctoral researcher at the European University Institute in Florence, a member of the POLCON project.
Research interests and areas of expertise
She studies how electoral and protest politics influence each other and how they jointly contribute to shaping politics. Currently, Jasmine works on protest in the context of the Great Recession and develops tools to gather data for a study of protest in 30 European countries over 15 years. She is interested in research methods that bring together the social sciences and computational linguistics.
She is experienced in designing data collection procedures for comparative political research and instruments for surveys and interviews. She has experience with annotation for NLP.
Argyris Altiparmakis is a PhD candidate in social and political sciences at the European University Institute, Florence, a member of the POLCON project.
Research interests and areas of expertise
Argyris is interested in the Great Recession and the European debt crisis and specifically on its impact on the areas of political economy, electoral behavior and protest participation in the countries of Southern Europe. His work is embedded in the POLCON project that aims to explore changes in protest trends and political structuration brought about by the Great Recession.
Alex Hanna is a PhD candidate in sociology at the University of Wisconsin-Madison.
Research interests and areas of expertise
Substantively, she is interested in social movements, political sociology, media, and the Middle East. She is interested in how new and social media has changed social movement mobilization and politics more generally. Methodologically, she is interested in computational social science, textual analysis, and social network analysis.
Peter Makarov is a PhD candidate in computational linguistics at the University of Zurich, a member of the POLCON project.
Research interests and areas of expertise
He is interested in mathematical methods in NLP and general linguistics, information extraction, application of NLP to social science problems.
×××
The goal of this task is to extract structured information about public protest events from newswire documents. Typically, protest events are open to the public, politically motivated and not institutionalised. This includes strikes, demonstrations, riots, terrorist attacks, on-line campaigns, but not e.g. elections. For each protest event, we are interested in a small set of features such as its form, issue, actors, location and time. We also want to know what has acted as an immediate trigger to a protest action (e.g. a political decision or inaction, a court ruling, or a political scandal), who or what a protest action is targeted at (e.g. an individual or organisation in a position to act on the issue), and what political reaction a protest action has brought about. As an example, consider this excerpt:
Brussels protest demands European plan for oil tanker disasters [...]
Protestors from Spain and other European countries held a demonstration in Brussels on Saturday, calling for a set of measures to deal with oil tanker disasters such as the sinking of the Prestige off the coast of northern Spain. [...]
The protest was organized by the Spanish environmental group Nunca Mais (Never Again) who estimated that around 1,000 to 1,500 took part, including the singer Manu Chao.
Deutsche Presse-Agentur, 14 June 2003
From this document, we would like to know the form of the protest event (demonstration), its location (Brussels), time (Saturday, 14 June 2003), (issue environmental protection), actor (Nunca Mais), size (1,000 to 1,500 demonstrators), and the trigger event (the sinking of the Prestige).
Multiple protest events can be mentioned in a single document, and many documents typically describe the same protest event. This is why we are particularly interested in the aggregation of information on protest events from multiple documents.
We also distinguish sets of protest events, all on the same theme, that are related through reaction/trigger pairs. Sometimes one finds that a trigger event sets off a protest action and the ensuing reaction to it by the challenged actor causes a new wave of protest. The row over the Temelin nuclear power plant located in the Czech Republic is a case in point:
21 February 2007:
Austrian demonstrators briefly closed two border crossings with the Czech Republic on Wednesday to protest at what they say are Czech broken promises to improve the safety of the Soviet-designed Temelin nuclear plant. [...]
The Austrian Chancellor is due to make an official visit to Prague on February 27 which will include a meeting with Czech Prime Minister Mirek Topolanek.
28 February 2007:
Austrian opponents of the south Bohemian nuclear power plant Temelin today again blocked three Austrian-Czech border crossings [...]
Austrian anti-nuclear activists said on Tuesday that Austrian Chancellor Gusenbauer's visit to Prague was ``totally disappointing'' and produced no results and they would therefore continue blocking the border in protest against Temelin.
We consider the meeting as the trigger event for the first wave of blockades, the inaction by the Austrian prime minister as the reaction to these protests and also the trigger to the new round of blockades.
The competition will be conducted on English-language newswire documents. The organisers will provide an annotated set of a total size of up to 3,000 protest-related newswire documents. The documents are annotated at the level of tokens, with links spanning across documents. As the identification of relevant documents is a major practical problem in event data collection, we additionally provide a set of newswire documents covering general news.
We envisage interest in this task at varying levels of complexity – detection of protest events in general news, identification of protest event features, aggregation of information on protest events across multiple documents, identification of chains of protest events – and we consider accommodating all submissions.
×××
We provide light annotations to newswire documents published in English. A light annotation approach means that the documents are annotated by experts in protest event analysis and not by computational linguists. We focus on encoding information about specific questions through annotation (what, where, when, who, how, why an event happened) and this specific information points at the parts of the text that are used to interpret the content of the document. The purpose is to create an annotated corpus representing complex information without being complex. In order to do this, we build task-specific tags that do not carry linguistic meaning, but that highlight parts of the text that are of interest. So the annotations are text-bound, annotators highlight in the text the information they used to code specific features.
Back to top of the page