Resources

Web Apps

Shiny apps

Explore semantic similarity graphs

Interactive viewers

E-VIEW-Alation (interactive visualization of collocation identification quality)

  • eLex 2017 evaluation of 20 association measures, 13 corpora, 8 context sizes and 4 frequency thresholds on 2 gold standards

Toys

Internal access

Corpus Access

Public interfaces

Login required

Internal access

Software & Data

Python packages

  • SoMaJoA tokenizer and sentence splitter for German and English web and social media texts.
  • SoMeWeTaA part-of-speech tagger with support for domain adaptation and external resources.
  • pandas-association-measures – Statistical association measures for co-occurrence dataframes in pandas.
  • cwb-ccc – A CWB wrapper to extract concordances and collocates.

Data

  • GeRedE – A corpus of German Reddit exchanges.
  • EmpiriST 2.0 – A manually annotated corpus consisting of German web pages and German computer-mediated communication (CMC).