MMDA toolkit

Exploring the “Fukushima Effect”

From CCDA to MMDA – Our Methodological Vision

We take the method of corpus-­based critical discourse analysis (CCDA) as a starting point for the desired multi-­disciplinary integration. Similar to CCDA, we operationalize a discourse as the pairing of a topic (e.g. nuclear energy) with certain attitudes or opinions towards this topic (e.g. a fear of nuclear disasters). We also make the fundamental assumption that both topics and attitudes can be characterized by suitable patterns of lexical items (words or fixed expressions such as slogans) – an assumption shared, amongst others, by CCDA, topic modelling and sentiment analysis.

Our methodology thus puts the human analyst at the centre of a highly dynamic procedure that integrates state-­of-­the-­art approaches from multiple disciplines, qualitative and quantitative perspectives, as well as human, semi-­automatic and fully automatic analyses: The CCDA core of our procedure (entitled “MMDA” – Mixed Methods Discourse Analysis) makes use of corpus-­linguistic methods developed and refined by our research group, including distributional similarity, multivariate analysis of correlational patterns and improved approaches to collocation analysis, as well as state-­of-­the-­art techniques from natural language processing such as sentiment analysis and topic clustering.

These methods are complemented by the theoretical background and insights of a human analyst rooted in the discipline of cultural studies. The human analyst ideally brings along expertise in both the discourse to be analyzed and the techniques of the tool, merging into a hermeneutic cyborg (cf. Stefan Evert’s Sinclar Lecture “The Hermeneutic Cyborg”).


It is worth mentioning that although CCDA reduces research bias – since the researcher is confronted with empirical data – this triangulation comes with an intense work load. In the current state of affairs, the main functionalities implemented in the MMDA toolkit are supposed to ease the manual work of hermeneutic interpretation. Since a major task of interpretation involves categorizing collocates of a topic query into groups, the toolkit assists by visualizing the results of the collocation analysis in an appropriate manner. On that account, we project high-dimensional word vectors onto a two-dimensional space. For an outcome of this process, see the following figure, which displays collocates of the “nuclear energy” in Japanese Twitter data (operationalized by the regular expression “(原子*)|(原発)”).

Semantically similar words are automatically grouped together; the hermeneutic interpreter thus can rely on pre-grouped words to form categories. Furthermore, the size of the lexical items reflects their statistical association to the topic node “(原子*)|(原発)” as measured by Log-Likelihood. The toolkit supports interactive grouping of the collocates as well as a range of options in order to tweak the visualization, including a choice of different association measures and window sizes for the initial collocation analysis.

Further advantages of MMDA can be realized by interacting with the toolkit: Having grouped collocates together, the researcher can display discourse concordances, i.e. collocates of the (topic, group)-pair, and start additional collocation analyses in order to triangulate the semantics of the discourse at hand.

Demo and Implementation Details

We have a demo of our toolkit up and running from within the FAU network at Drop us a line if you want to have access.

The backend is implemented in Python using Flask. It builds upon the IMS Open Corpus Workbench and the UCS toolkit. All low-level CWB calls are being abstracted using Python. This is mostly done using Pandas DataFrames and native Python Classes. At this stage, all collocates extracted from a corpus are being processed using computational linguistic techniques such as Word2Vec, dimensionality reduction is done using t-distributed stochastic neighbour embedding (t-SNE). Finally, a standardized (HTTP) API is provided using Flask. This interface is used to abstract complex operations and deliver structured data to a frontend for visualisation. The Python Backend documentation is available via Sphinx

The current (August 2018) frontend implementation uses Vue.js and Konva. It depends on the previously described backend and is used for visualization and interaction with the data provided. A modern Material Design UI was created for this purpose, based on the open-source JavaScript framework Vue.js. A HTML5 2D Canvas library handles the visualization of the data.

Both frontend and backend are developed on gitlab.

Qualitative results

First qualitative results using the methods implemented in MMDA can be found in our paper “A Transnational Analysis of News and Tweets about Nuclear Phase-Out in the Aftermath of the Fukushima Incident” in the book of proceedings of the LREC Workshop on Computational Impact Detection from Text Data. Further results will be presented at the Fourth Asia Pacific Corpus Linguistics Conference (“Extending Corpus-Based Discourse Analysis for Exploring Japanese Social Media”).