Vortrag: Timm Weber (25.06.2025)
Im Rahmen des Oberseminars Computerlinguistik findet am 25.06.2025 ein Vortrag statt, zu dem wir herzlich einladen.
Vortragender: Timm Weber (FAU, Lehrstuhl für Korpus- und Computerlinguistik)
Zeit: Mittwoch, 11.06.2025, 16:15–17:45 Uhr
Ort: CIP-Pool Computerlinguistik, Bismarckstr. 12, Raum 0.320
Thema: „Handling Large Text Corpora and the future of the Corpus Workbench“
The aim of this talk is to give an overview of the internal encoding and indexing paradigms used by the IMS Open Corpus Workbench (CWB) and to discuss recent steps towards its next big update, CWB version 4. It presents Ziggurat, a corpus data model for very large, richly annotated text corpora and an experimental implementation of this model written in Rust. The key findings from my Bachelor’s thesis are presented to discuss the overall performance and scaling of this implementation for verly large corpora when compared to the current version of the CWB. Finally, an outlook is given for the potential roadmap for Ziggurat and towards CWB4.