LeAK 2 – Automatic anonymisation experiments

In LeAK 2 we experimented with a joint learning (Multitask) approach. I.e., a single NER tagger comprising multiple prediction headers (spans, entity classes and risks). We used the same corpus as in LeAK and combined both domains (Landlord-tenant and Transport) during the training step and achieved a Recall of around 98.8% for high risk and a total F1-score of 96.56% (P=96.88, R=96.24). To evaluate the generalisability of the multitask method, we further conducted experiments using verdicts from another German legal authority, namely Oberlandesgericht (OLG). The OLG corpus consists of following law domains:

  • Bankensachen
  • Bausachen
  • Beschwerdeverfahren
  • Familiensachen
  • Handelssachen
  • Immaterialgüter
  • Kapitalanlagesachen
  • Kostensachen
  • Schiedssachen
  • Verkehrsunfallsachen
  • Zivilsachen

This sample is not yet anonymised and therefore, the evaluation was done in a secured environment. Since the current model has not learned the document structure of these new domains, we were expecting a performance drop during the test. Still, the following results table suggests that the multitask model is capable of detecting a high rate of text spans that have high risk of deanonymisation if disclosed and could be improved by using a small sample of these data for training.

Domain Anonymised data points Recall by risk level
  Precision Recall F1 High Medium Low
Bankensachen 0.89 0.88
0.88 0.84 0.88
0.89
Bausachen 0.87 0.88 0.87 0.91 0.95 0.84
Beschwerdeverfahren 0.78 0.81 0.80 0.81 0.52 0.84
Familiensachen  0.86 0.86 0.86 0.91 0.68 0.85
Handelssachen 0.87
0.91 0.89
0.91 0.91 0.91
Immaterialgüter 0.70 0.69 0.69
0.73 0.62 0.73
Kapitalanlagesachen 0.79 0.85 0.82 0.91 0.89 0.81
Kostensachen 0.79 0.77 0.78 0.83 0.87 0.74
Schiedssachen 0.88 0.84 0.86 0.92 0.66 0.84
Verkehrsunfallsachen 0.84 0.90 0.87 0.87 0.93 0.91
Zivilsachen 0.83 0.84 0.83 0.82 0.85 0.85
Avg. 0.83 0.84 0.83 0.86 0.80 0.84