Forschungsstelle Digitale Nachhaltigkeit

Seminar on Natural Language Processing (NLP)

This seminar provides a conceptual and practical introduction into modern Natural Language Processing (NLP) methods and technologies. Each lecture introduces a new NLP approach based on a seminal publication and including a presentation of an academic guest speaker. The NLP methods include Bag-of-words (BoW), term frequency–inverse document frequency (TF-IDF), word2vec, long short-term memory (LSTM), latent Dirichlet allocation (LDA), transformers, BERT, and GPT-3.

Before each lecture, the students have to read the indicated research article and ask a key question for the discussion. In addition, each student has to conduct and eventually present a personal project related to NLP. This seminar is mandatory for all students conducting a bachelor or master thesis at the Research Center for Digital Sustainability.

Time, Location, and Links

Schedule 2021

Date Topic Mandatory Paper or BlogPost Speakers
24 September 2021 Overview and introduction, NRP77 project on reidentification of Swiss judgments, presentation of topics for a thesis project   Joel Niklaus and Matthias Stürmer, University of Bern
1 October 2021 Bag-of-words (BoW) and term frequency-inverse document frequency (TF-IDF)

Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saied Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut - A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques, https://arxiv.org/abs/1707.02919

Dominic Schweizer, University of Bern
8 October 2021 word2vec

https://jalammar.github.io/illustrated-word2vec/

https://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
Prof. Dr. Tobias Hodel, Digital Humanities University of Bern
15 October 2021 Presentation of student project proposals

Maximum 5 min per student!
Possible talking points:
What methods do you plan to use? What is the approximate timeline? How does the data look like? What is special about your courts?

Students
22 October 2021 Recurrent Neural Networks

LSTM: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Lecture Materials: https://drive.google.com/drive/folders/1ldet--Yjo6xos_cNnpGiqXmnV5OW-a3z 
Exercise Solutions: https://colab.research.google.com/drive/16dQdAhYfOZbPEAe-nsHOK8NuntLt0x2T?usp=sharing

Dr. Mathias Müller, Postdoc and Lecturer at University of Zürich
29 Oktober 2021 ML and NLP in industry Technical Debt: https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf 
MLOps: https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf
(Compressible Subspace: https://arxiv.org/pdf/2110.04252.pdf)
Siddhartha Singh
5 November 2021 Building Knowledge Graphs using NLP https://towardsdatascience.com/the-building-a-large-scale-accurate-and-fresh-knowledge-graph-71ebd912210e Prof. Dr. Patrizio Collovà, Bern University of Applied Sciences
12 November 2021 Latent Dirichlet allocation (LDA)   Silvia Terragni, PhD student at University of Milano-Bicocca
19 November 2021 Transformers   Joel Niklaus and Matthias Stürmer, University of Bern
26 November 2021 GPT3   Dr. Simon Clematide, Academic Associate at University of Zurich
3 December 2021 BERT   Dr. Ilias Chalkidis, NLP Postdoctoral Researcher at University of Copenhagen 
10 December 2021 Student final presentations

Maximum 10 min per student!
Possible talking points:
What are your results (e.g. coverage)? What were the difficulties you faced and how did you deal with them? What methods worked best? What did you learn?

 
17 December 2021 Student final presentations Maximum 10 min per student!
Possible talking points:
What are your results (e.g. coverage)? What were the difficulties you faced and how did you deal with them? What methods worked best? What did you learn?
 
24 December 2021 no lecture