A trip through Swiss politics and history

Our aim is to create a database of who said what and when in both chambers of the Swiss parliament over the past 127 years. The Swiss Federal Archives recently carried out the digitalization of the proceedings of both the National Council and the Council of States. Thanks to these efforts, we can now openly access over 40,000 documents pertaining to all votes, speeches, laws, amendments to laws, etc., from 1891 to the present day.
By
Luis Salamanca
December 21, 2018
Share this post

In the project entitled “A research platform for data-driven democracy studies in Switzerland” (DemocraSci), we are performing a comprehensive analysis of the Swiss parliament archives. The project is a collaboration between the Chair of Systems Design at ETH Zürich, led by Prof. Frank Schweitzer, and the SDSC. Our aim is to create a database of who said what and when in both chambers of the Swiss parliament over the past 127 years. The Swiss Federal Archives (Schweizerisches Bundesarchiv) recently carried out the digitalization of the proceedings of both the National Council and the Council of States. Thanks to these efforts, we can now openly access over 40,000 documents pertaining to all votes, speeches, laws, amendments to laws, etc., from 1891 to the present day. However, without the right tools, it is unfeasible to perform a proper analysis of this corpus. The aim of the project is, therefore, to structure the corpus into a queryable database. This includes the identification of topics of debates or the analysis of speeches to map the political positions and opinions of the members of parliament. In this project, we will create a so-called knowledge graph out of the proceedings, a task never before carried out on such a vast corpus of political archives. This knowledge graph captures different relationships between political entities. Figure 1 gives a mock example of our planned knowledge graph. The graph consists of different nodes (circles in Figure 1) that connect to other nodes. Each line between two nodes indicates the relationship between them. For instance, in the knowledge graph depicted in Figure 1, we can see that Silvia Schenker is a member of the SP party and cosponsored an intervention proposed by Maya Graf.

Figure 1: Mock knowledge graph showing relationships between different entities extracted from the parliamentary proceedings.

Our knowledge graph will capture different relationships and interactions between political entities such as politicians, parliamentary groups, committees, political parties, bills, interventions, votes, and speeches, for the whole time span from 1891 to today. Such a knowledge graph will be a valuable research tool for political scientists and historians. They will be able to answer a broad range of questions, such as how parties shift their focus over time, what type of conflicts of interest exist and how they arise, and which political topics drive polarization. Moreover, political scientists can analyze how socioeconomic events influence political decisions or study the trends of issues discussed in the councils, and even make predictions about expected voting outcomes or newly forming alliances. In a project of this magnitude, and with such involved data, we first need to curate and extract the useful information from the original raw files, enriching it with any extra information that may be useful for subsequent steps. These tasks are carried out in the first work package (WP) of the DemocraSci project. One important task within the first work package is labeling text lines of every document according to some pre-established categories (see Figure 2, left). For this, we need to detect margins, column separators, and other lines that help to define different sections of the text, as depicted in the right-hand side of Figure 2. Additionally, we need to ensure the proper ordering of all text lines, and their correct grouping into text boxes (see Figure 2, right). Hence, after performing this exhaustive preprocessing pipeline, we end up with a massive corpus of corrected text, mostly belonging to the speeches made by politicians during parliamentary sessions.

Figure 2: On the left we can observe the labeled text lines: headers (blue), 1st column text (magenta), 2nd column text (cyan), footnote (red), text in header (yellow) and single column header (black). On the right, we plot the results of the margin and central line detection (green lines), horizontal separators (red boxes), and text boxes (blue boxes). In both cases, we could still point out some small errors, but the process is robust enough to allow the extraction of the text.

After the extraction of the corpus in WP1, we can proceed with the most interesting part of the project, the natural language processing. This constitutes WP2 of the DemocraSci project, where the main aim is to extract additional entities that further enrich the envisioned knowledge graph, such as topics discussed and their historical evolution, or opinion of politicians on different matters. For all these, there exist well-established techniques on which we can rely. For example, latent Dirichlet allocation (LDA) is, in a nutshell, a technique that allows extracting different lists of relevant words and their associations with the analyzed documents. Each of these lists comprises specific terms that can be assigned to a specific category, i.e., a topic. This way, we can quickly summarize the main topics discussed during each session as well as list the politicians proposing them and also those intervening in the discussions. Given the size and uniqueness of the data set, we can apply more advanced methods. For example, dynamic LDA will allow us to analyze how topics evolved through time, e.g., how the rhetoric on women’s rights or the Swiss energy policy changed over time. Also, with the use of deep recurrent neural networks, we will perform sentiment analysis on the speeches, to elucidate not only the topics discussed by specific politicians, but also their opinion on those subjects. All this rich information we gather from the corpus will be integrated into the knowledge graph. Our knowledge graph will comprise thousands of entities and relations between them. Once ready, this graph will be hosted on an interactive web application where researchers, journalists and the interested public can interact with the data and perform their own analyses. They will be able to go through the topics and find out what was discussed in parliament and when. Also, it will be possible to explore which arguments were used to win a discussion in parliament, or examine how politicians changed their positions over the course of their years of service. Besides, different machine learning methods could be trained on these data, in order to fit models that predict future political outcomes. There are challenging research opportunities based on this data set. By publishing it as fully open access upon successful completion of the project—including the methods and the documentation on how the processing was done—we hope to encourage other digitalization projects and strengthen scientific advances to better understand our past and help to shape our future.

Co-authors

  • Lilian Gasser
  • Laurence Brandenberger
  • Prof. Schweitzer
  • Prof. Frank Schweitzer

About the author

A trip through Swiss politics and history
Luis Salamanca
Lead Data Scientist

Luis is originally from Spain, where he completed his bachelor's studies in Electrical engineering, and the Ms.C. on signal theory and communications, both at the University of Seville. During his Ph.D. he started focusing on machine learning methods, more specifically message passing techniques for channel coding, and Bayesian methods for channel equalization. He carried it out between the University of Seville and the University Carlos III in Madrid, also spending some time at the EPFL, Switzerland, and Bell Labs, USA, where he worked on advanced techniques for optical channel coding. When he completed his Ph.D. in 2013, he moved to the Luxembourg Center on Systems Biomedicine, where he switched his interest to neuroscience, neuroimaging, life sciences, etc., and the application of machine learning techniques to these fields. During his 4 and a half years there as a Postdoc, he worked on many different problems as a data scientist, encompassing topics such as microscopy image analysis, neuroimaging, single-cell gene expression analysis, etc. He joined the SDSC in April 2018.  As Lead Data Scientist, Luis coordinates projects in various domains. Several projects focus on the application of natural language processing and knowledge graphs to the study of different phenomena in social and political sciences. In the domains of architecture and engineering, Luis is responsible for projects centered on the application of novel generative methods to parametric modeling. Finally, Luis also coordinates different projects in robotics, ranging from collaborative robotic construction to deformable object manipulation.

Share this post

More blog posts

November 22, 2024

The SDSC Establishes Permanent Presence at Biopôle with Support from Canton Vaud

The SDSC Establishes Permanent Presence at Biopôle with Support from Canton Vaud

Press Release: The Swiss Data Science Center Establishes a Permanent Presence at Biopôle with Support from the Canton of Vaud.
Our News
November 5, 2024

Insights from the "ORD for the Sciences" Hackathon

Insights from the "ORD for the Sciences" Hackathon

Discover the highlights from the ORD for the Sciences Hackathon that took place Oct. 24-25, 2025 at EPFL.
Our News
November 4, 2024

MeteoSwiss and the SDSC join forces

MeteoSwiss and the SDSC join forces

The Federal Office of Meteorology and Climatology (MeteoSwiss) and the Swiss Data Science Center have signed a framework agreement.
Our News

More news

January 24, 2024

ADORE | A benchmark dataset in ecotoxicology to foster the adoption of machine learning

ADORE | A benchmark dataset in ecotoxicology to foster the adoption of machine learning

Applying machine learning to ecotoxicology could help reduce the number of animal tests, costs, and animals sacrificed while preserving the accuracy of the in vivo tests.
Blog
June 2, 2020

AI trends & use cases in the pharmaceutical industry

AI trends & use cases in the pharmaceutical industry

Artificial Intelligence (AI) is not an alien word anymore nowadays. We see both academic and industrial institutions adopting AI topics as a part of their curriculum and use cases to accelerate existing processes. The pharmaceutical industry is one of them.
Blog

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!