A trip through Swiss politics and history

By
Luis Salamanca
December 21, 2018
Share this post

In the project entitled “A research platform for data-driven democracy studies in Switzerland” (DemocraSci), we are performing a comprehensive analysis of the Swiss parliament archives. The project is a collaboration between the Chair of Systems Design at ETH Zürich, led by Prof. Frank Schweitzer, and the SDSC. Our aim is to create a database of who said what and when in both chambers of the Swiss parliament over the past 127 years. The Swiss Federal Archives (Schweizerisches Bundesarchiv) recently carried out the digitalization of the proceedings of both the National Council and the Council of States. Thanks to these efforts, we can now openly access over 40,000 documents pertaining to all votes, speeches, laws, amendments to laws, etc., from 1891 to the present day. However, without the right tools, it is unfeasible to perform a proper analysis of this corpus. The aim of the project is, therefore, to structure the corpus into a queryable database. This includes the identification of topics of debates or the analysis of speeches to map the political positions and opinions of the members of parliament. In this project, we will create a so-called knowledge graph out of the proceedings, a task never before carried out on such a vast corpus of political archives. This knowledge graph captures different relationships between political entities. Figure 1 gives a mock example of our planned knowledge graph. The graph consists of different nodes (circles in Figure 1) that connect to other nodes. Each line between two nodes indicates the relationship between them. For instance, in the knowledge graph depicted in Figure 1, we can see that Silvia Schenker is a member of the SP party and cosponsored an intervention proposed by Maya Graf.

Figure 1: Mock knowledge graph showing relationships between different entities extracted from the parliamentary proceedings.

Our knowledge graph will capture different relationships and interactions between political entities such as politicians, parliamentary groups, committees, political parties, bills, interventions, votes, and speeches, for the whole time span from 1891 to today. Such a knowledge graph will be a valuable research tool for political scientists and historians. They will be able to answer a broad range of questions, such as how parties shift their focus over time, what type of conflicts of interest exist and how they arise, and which political topics drive polarization. Moreover, political scientists can analyze how socioeconomic events influence political decisions or study the trends of issues discussed in the councils, and even make predictions about expected voting outcomes or newly forming alliances. In a project of this magnitude, and with such involved data, we first need to curate and extract the useful information from the original raw files, enriching it with any extra information that may be useful for subsequent steps. These tasks are carried out in the first work package (WP) of the DemocraSci project. One important task within the first work package is labeling text lines of every document according to some pre-established categories (see Figure 2, left). For this, we need to detect margins, column separators, and other lines that help to define different sections of the text, as depicted in the right-hand side of Figure 2. Additionally, we need to ensure the proper ordering of all text lines, and their correct grouping into text boxes (see Figure 2, right). Hence, after performing this exhaustive preprocessing pipeline, we end up with a massive corpus of corrected text, mostly belonging to the speeches made by politicians during parliamentary sessions.

Figure 2: On the left we can observe the labeled text lines: headers (blue), 1st column text (magenta), 2nd column text (cyan), footnote (red), text in header (yellow) and single column header (black). On the right, we plot the results of the margin and central line detection (green lines), horizontal separators (red boxes), and text boxes (blue boxes). In both cases, we could still point out some small errors, but the process is robust enough to allow the extraction of the text.

After the extraction of the corpus in WP1, we can proceed with the most interesting part of the project, the natural language processing. This constitutes WP2 of the DemocraSci project, where the main aim is to extract additional entities that further enrich the envisioned knowledge graph, such as topics discussed and their historical evolution, or opinion of politicians on different matters. For all these, there exist well-established techniques on which we can rely. For example, latent Dirichlet allocation (LDA) is, in a nutshell, a technique that allows extracting different lists of relevant words and their associations with the analyzed documents. Each of these lists comprises specific terms that can be assigned to a specific category, i.e., a topic. This way, we can quickly summarize the main topics discussed during each session as well as list the politicians proposing them and also those intervening in the discussions. Given the size and uniqueness of the data set, we can apply more advanced methods. For example, dynamic LDA will allow us to analyze how topics evolved through time, e.g., how the rhetoric on women’s rights or the Swiss energy policy changed over time. Also, with the use of deep recurrent neural networks, we will perform sentiment analysis on the speeches, to elucidate not only the topics discussed by specific politicians, but also their opinion on those subjects. All this rich information we gather from the corpus will be integrated into the knowledge graph. Our knowledge graph will comprise thousands of entities and relations between them. Once ready, this graph will be hosted on an interactive web application where researchers, journalists and the interested public can interact with the data and perform their own analyses. They will be able to go through the topics and find out what was discussed in parliament and when. Also, it will be possible to explore which arguments were used to win a discussion in parliament, or examine how politicians changed their positions over the course of their years of service. Besides, different machine learning methods could be trained on these data, in order to fit models that predict future political outcomes. There are challenging research opportunities based on this data set. By publishing it as fully open access upon successful completion of the project—including the methods and the documentation on how the processing was done—we hope to encourage other digitalization projects and strengthen scientific advances to better understand our past and help to shape our future.

Co-authors

  • Lilian Gasser
  • Laurence Brandenberger
  • Prof. Schweitzer
  • Prof. Frank Schweitzer

About the author

A trip through Swiss politics and history
Luis Salamanca
Lead Data Scientist

Luis is originally from Spain, where he completed his bachelor's studies in Electrical engineering, and the Ms.C. on signal theory and communications, both at the University of Seville. During his Ph.D. he started focusing on machine learning methods, more specifically message passing techniques for channel coding, and Bayesian methods for channel equalization. He carried it out between the University of Seville and the University Carlos III in Madrid, also spending some time at the EPFL, Switzerland, and Bell Labs, USA, where he worked on advanced techniques for optical channel coding. When he completed his Ph.D. in 2013, he moved to the Luxembourg Center on Systems Biomedicine, where he switched his interest to neuroscience, neuroimaging, life sciences, etc., and the application of machine learning techniques to these fields. During his 4 and a half years there as a Postdoc, he worked on many different problems as a data scientist, encompassing topics such as microscopy image analysis, neuroimaging, single-cell gene expression analysis, etc. He joined the SDSC in April 2018.  As Lead Data Scientist, Luis coordinates projects in various domains. Several projects focus on the application of natural language processing and knowledge graphs to the study of different phenomena in social and political sciences. In the domains of architecture and engineering, Luis is responsible for projects centered on the application of novel generative methods to parametric modeling. Finally, Luis also coordinates different projects in robotics, ranging from collaborative robotic construction to deformable object manipulation.

Share this post

More blog posts

May 2, 2019

ACE-DATA | Antarctic circumnavigation expedition – delivering added value to Antarctica

ACE-DATA | Antarctic circumnavigation expedition – delivering added value to Antarctica

Understanding the complexity of the Earth systems and our climate is important to be able to make predictions about how they may change in the future. To do this, scientists use models which describe the relevant processes.
Blog
February 28, 2023

DLBIRHOUI | Deep Learning Based Image Reconstruction for Hybrid Optoacoustic and Ultrasound Imaging

DLBIRHOUI | Deep Learning Based Image Reconstruction for Hybrid Optoacoustic and Ultrasound Imaging

Optoacoustic imaging is a new, real-time feedback and non-invasive imaging tool with increasing application in clinical and pre-clinical settings. The DLBIRHOUI project tackles some of the major challenges in optoacoustic imaging to facilitate faster adoption of this technology for clinical use.
Blog
November 1, 2022

SEMIRAMIS | A new approach to AI-Augmented architectural design

SEMIRAMIS | A new approach to AI-Augmented architectural design

As the world’s cities continue to grow, land is becoming increasingly scarce. However, open space is vital in urban areas. Semiramis is a new approach to AI-augmented architectural design, allowing designers a quick and easy selection of feasible performance values and a qualitative evaluation of the generated geometries.
Blog

More news

April 29, 2021

A 3rd SDSC office at the Paul Scherrer Institute

A 3rd SDSC office at the Paul Scherrer Institute

Another office for the Swiss Data Science Center will be established at the Paul Scherrer Institute PSI. To this end, the ETH Board has approved an increase of five million Swiss francs in the budget of the strategic focus area Data Science. The main aim of this expansion is to help improve the evaluation and processing of the growing amounts of data from large and complex research infrastructures, sensor networks, and databases at PSI and the other three federal research institutes, Empa, WSL, and Eawag. The resources and expertise will be available to all institutes in the ETH Domain.
Our News

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!