Vortrag: Marc Kupietz (07.02.2024)

Im Rahmen des Oberseminars Computerlinguistik findet am 07.02.2024 ein Vortrag statt, zu dem wir herzlich einladen möchten.



Dr. Marc Kupietz (Leibniz-Institut für Deutsche Sprache, Mannheim)



Mittwoch, 07.02.2024, 16:15-17:45 Uhr



Bismarckstr. 12, R.0.320 (in Präsenz) / auch via Zoom (Link folgt über uniinterne Verteiler, externe Anmeldungen gerne über info@linguistik.uni-erlangen.de!)



Some Challenges in Corpus Linguistics and IDS Mannheim approaches to tackle them



In recent years, corpus linguistics has become an indispensable aspect of numerous linguistic sub-disciplines, to the point where the term seems almost interchangeable with linguistics itself. Moreover, the increasing availability of digital texts on the internet, coupled with the recent extension of the text and data mining exception in EU copyright law, has rendered creating large linguistic corpora a feasible undertaking. However, despite these upbeat developments, some challenges persist and some new ones may have emerged. In my presentation, I will address some of the methodological challenges, such as how to derive linguistic findings about language domains from corpus findings, how to cope with errors in corpora and annotations, how to utilise language models to derive promising linguistic hypotheses, and how to gain comparative linguistic insights from multilingual corpora. For each of these questions, I will present approaches adopted by the corpus linguistics programme area at the IDS and discuss them specifically with regard to the equally still persistent logistical and economic challenges of feasibility and sustainability.