Methodological foundations of corpus research and digital humanities
Corpus research in linguistics as well as in the digital humanities and social sciences relies on a wide range of statistical techniques and visualizations. A central goal of our research is to develop sound methodological foundations for corpus linguistics, which address key problems in order to ensure that quantitative analyses are both reliable and meaningful.
- Quantitative methodology for literary stylometry (e-Humanities-Zentrum KALLIMACHOS)
- KALLIMACHOS Centre for Digital Humanities: corpus-linguistic approaches and statistical methodology (phase 1), linguistic complexity in literary stylometry (phase 2)
(10/2014 – 09/2019)
- Efficient simulation experiments for large-scale parameter optimisation of machine learning approaches in natural language processing
(10/2016 – 09/2017)
- Evert, Stefan; Proisl, Thomas; Jannidis, Fotis; Reger, Isabella; Pielström, Steffen; Schöch, Christof; Vitt, Thorsten (2017). Understanding and explaining Delta measures for authorship attribution. Digital Scholarship in the Humanities 22(suppl_2), ii4–ii16.
- Evert, Stefan and Neumann, Stella (2017). The impact of translation direction on characteristics of translated texts. A multivariate analysis for English and German. In G. De Sutter, M.-A. Lefer, and I. Delaere (eds.), Empirical Translation Studies. New Theoretical and Methodological Traditions (TiLSM 300), pages 47–80. Mouton de Gruyter, Berlin.
☞ online supplement
- Evert, Stefan; Wankerl, Sebastian; Nöth, Elmar (2017). Reliable measures of syntactic and lexical complexity: The case of Iris Murdoch. In Proceedings of the Corpus Linguistics 2017 Conference, Birmingham, UK.
- Evert, Stefan and Arppe, Antti (2015). Some theoretical and experimental observations on naïve discriminative learning. In Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics (QITL-6), Tübingen, Germany.
- Baroni, Marco and Evert, Stefan (2007). Words and echoes: Assessing and mitigating the non-randomness problem in word frequency distribution modeling. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), pages 904–911, Prague, Czech Republic.
- Evert, Stefan (2006). How random is a corpus? The library metaphor. Zeitschrift für Anglistik und Amerikanistik 54(2), 177–190.
- Open-source course on Statistical Inference – A Gentle Introduction for (Computational) Linguists (LinC 2018, Birmingham 2016, MaLT 2015, Zürich 2010, EMA 2008, DGfS/CL 2007, …)
- Tutorial / course on Type-Token Distributions & Zipf’s Law (LREC 2018, Birmingham 2018, ESSLLI 2006)