• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
Logo des Lehrstuhls für Korpus- und Computerlinguistik
  • FAUTo the central FAU website
  1. Friedrich-Alexander-Universität
  2. Philosophische Fakultät und Fachbereich Theologie
  3. Department Germanistik und Komparatistik
Suche öffnen
  • Campo
  • StudOn
  • FAUdir
  • Jobs
  • Map
  • Help
  1. Friedrich-Alexander-Universität
  2. Philosophische Fakultät und Fachbereich Theologie
  3. Department Germanistik und Komparatistik

Logo des Lehrstuhls für Korpus- und Computerlinguistik

Navigation Navigation close
  • Research
    • Methodological foundations of corpus research and digital humanities
    • Corpus tools and language technology
    • Collocations, multiword expressions and corpus-based discourse analysis
    • Further research
    • All publications
    Research
  • Projects
    • RC21
    • PING
    • NormRechts
    • LeAK & AnGer
    • Past Projects
    Projects
  • Resources
    • Corpus Access
    • Web Apps
    • Software & Data
    Resources
  • Teaching
    • Informationen für Erstsemester
    • Rund um den Studiengang
    • Lehrveranstaltungen
    • Oberseminar CL
    • CIP-Pool und Bibliothek
    • FSI Computerlinguistik
    • Arbeiten am Lehrstuhl
    Teaching
  • Team
    • Lead
    • Administrative Office
    • Research Assistants
    Team
  • Blog
  1. Home
  2. Research
  3. Corpus tools and language technology

Corpus tools and language technology

In page navigation: Research
  • Methodological foundations of corpus research and digital humanities
  • Corpus tools and language technology
  • Collocations, multiword expressions and corpus-based discourse analysis
  • Further research

Corpus tools and language technology

We develop algorithms and software tools for the automatic linguistic annotation, efficient indexing, flexible query and quantitative analysis of large text corpora. These tools form the basis of innovative research in the digital humanities as well as practical and commercial applications in language technology.

Project funding

  • Reconstructing Arguments from Noisy Text (RANT)
    (01/2018 – 12/2020)
  • RogTCS – text clustering for the analysis of open questions in market research
    (03/2013 – 09/2014)

Key publications

  • Evert, Stefan; Greiner, Paul; Baigger, João Filipe; Lang, Bastian (2016). A distributional approach to open questions in market research. Computers in Industry 78, 16–28.
  • Evert, Stefan and Hardie, Andrew (2015). Ziggurat: A new data model and indexing format for large annotated text corpora. In Proceedings of the 3rd Workshop on the Challenges in the Management of Large Corpora (CMLC-3), pages 21–27, Lancaster, UK.
    ☞  specification & further information
  • Proisl, Thomas (2018). SoMeWeTa: A part-of-speech tagger for German social media and web texts. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan.
    ☞  source code (GitHub)
  • Kabashi, Besim and Proisl, Thomas (2018). Albanian part-of-speech tagging: Gold standard and evaluation. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan.

Events

  • Empirikom Shared Task (EmpiriST 2015) on tokenization and POS tagging of German web corpora and computer-mediated communication
Computational Corpus Linguistics
Prof. Dr. Stephanie Evert

Bismarckstraße 6
91054 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • RSS Feed
  • Twitter
  • YouTube
Up