Corpus tools and language technology

We develop algorithms and software tools for the automatic linguistic annotation, efficient indexing, flexible query and quantitative analysis of large text corpora. These tools form the basis of innovative research in the digital humanities as well as practical and commercial applications in language technology.

Project funding

Key publications


  • Empirikom Shared Task (EmpiriST 2015) on tokenization and POS tagging of German web corpora and computer-mediated communication