Web Apps
Shiny apps
Explore semantic similarity graphs
- word2vec (200k word forms)
- various Wikipedia-based DSMs (50k lemmas)
Interactive viewers
E-VIEW-Alation (interactive visualization of collocation identification quality)
- eLex 2017 evaluation of 20 association measures, 13 corpora, 8 context sizes and 4 frequency thresholds on 2 gold standards
Toys
- TextParrot (random text generator trained on the Brown corpus)
- CWB Wordlist Explorer (search word lists with PCRE regular expressions)
Internal access
Corpus Access
Public interfaces
- CQP Web demos (Dickens, Europarl, Bundestag)
- Google Web 1T 5-Grams (Web1T5-Easy interface)
Login required
- German newspaper corpora (CQPDemo interface)
Internal access
Software & Data
Python packages
- SoMaJo – A tokenizer and sentence splitter for German and English web and social media texts.
- SoMeWeTa – A part-of-speech tagger with support for domain adaptation and external resources.
- pandas-association-measures – Statistical association measures for co-occurrence dataframes in pandas.
- cwb-ccc – A CWB wrapper to extract concordances and collocates.
Data
- GeRedE – A corpus of German Reddit exchanges.
- EmpiriST 2.0 – A manually annotated corpus consisting of German web pages and German computer-mediated communication (CMC).