Web Scraping for a Database of Court Decision Related Documents

This project is available as a Seminar or Bachelor's project. This project is also available as a group project.

Introduction

Swiss court decisions are anonymized to protect the privacy of the involved people (parties, victims, etc.). Previous research [1] has shown that in certain cases it is possible to re-identify companies involved in court decisions by linking the rulings with external data. Our project tries to go a step further by building an automated system for re-identifying involved people from court rulings. This system can then be used as a test for the anonymization practice of Swiss courts. For more information regarding the overarching research project please go here.

For a successful re-identification of involved people in a court decision, we need external data. This project has the goal of building a well-structured database of external data connected to Swiss federal court rulings.

List of possible data sources:

Media messages of the courts (e.g. the federal court)
BAG Bulletin
Swiss Transporation Safety Investigation Board
Newspapers (SRF, swissdox, etc.)
Online phone books (e.g. local.ch)
Social Media (e.g. Twitter, Facebook)
Government documents
Trade registers/Business register (e.g. Moneyhouse)
Map Information systems/Land register

Research Questions

So far, there has not been collected a dataset of external documents related to Swiss federal court rulings.

RQ1: Which data sources are most likely to contain information also occurring in Swiss federal court rulings?

Steps

Identify promising data sources
Analyze the HTML and scrape the documents from the websites (using libraries such as Scrapy or BeautifulSoup)
Extract the text from the documents
Structure the documents in a database
Evaluate the results

Activities

⬤⬤⬤⬤◯ Programming

⬤⬤◯◯◯ Experimentation

⬤◯◯◯◯ Literature

Prerequisites

Good programming skills (preferably in Python)

Contact

Joel Niklaus

References

[1] Vokinger, K.N., Mühlematter, U.J., 2019. Re-Identifikation von Gerichtsurteilen durch «Linkage» von Daten(banken). Jusletter 27.

Digital Sustainability Group

Web Scraping for a Database of Court Decision Related Documents