Vortrag: Natalie Finlayson, Michaela Mahlberg, Aleksandr Piperski, Stephanie Evert (07.06.2023)

Im Rahmen des Oberseminars Computerlinguistik findet am 7.6.2023 ein Vortrag statt, zu dem wir herzlich einladen möchten.



Natalie Finlayson & Michaela Mahlberg (U of Birmingham), Aleksandr Piperski & Stephanie Evert (FAU)



Mittwoch, 7.6.2023, 16:15-17:45 Uhr



Bismarckstr. 12, R.0.320 (in Präsenz) / auch via Zoom (Link folgt über uniinterne Verteiler, externe Anmeldungen gerne über info@linguistik.uni-erlangen.de!)



“RC21 – Towards computer-assisted concordance reading”



As one of the most fundamental and central techniques of corpus linguistics, concordance analysis supports the identification of recurrent patterns across the occurrences of a search term, phrase or construction (the ‘node’). This is achieved by organising concordance lines according to similarities that become visible through the ‘kwic’ (keyword in context) display format. The key challenge is that the underlying notion of ‘similarity’ is often not clearly defined, and different research questions and applications demand a focus on different aspects of similarity. Typically, choices for the organisation of concordances are determined by the intuition of experienced analysts, but also strongly driven by the options offered by specific concordance software tools – in particular sorting the right or left context of the node alphabetically. An early approach towards a systematic account of ‘reading concordances’ was proposed by John Sinclair, but these ideas have only selectively been taken forward and concordance reading is still not being taught methodically in the corpus linguistics curriculum.

In this talk we want to look at opportunities for enhancing concordance analysis with suitable computational algorithms. Based on examples of existing corpus tools and case studies, we will review current practice in corpus linguistics to arrive at an understanding of how the affordances of current tools work together with qualitative interpretation. We will outline what we see as the fundamental tool-independent principles of ‘selecting’, ‘ranking’, ‘clustering’ and ‘sorting’, and demonstrate what we consider to be useful applications of these principles. Our talk will be illustrated with textual examples from corpora of fiction from both English and German authors. Functionalities we will specifically discuss in this case study build on our previous work on CLiC and IMS Corpus Workbench.