• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
Logo des Lehrstuhls für Korpus- und Computerlinguistik
  • FAUTo the central FAU website
  1. Friedrich-Alexander-Universität
  2. Philosophische Fakultät und Fachbereich Theologie
  3. Department Germanistik und Komparatistik
Suche öffnen
  • Campo
  • StudOn
  • FAUdir
  • Jobs
  • Map
  • Help
  1. Friedrich-Alexander-Universität
  2. Philosophische Fakultät und Fachbereich Theologie
  3. Department Germanistik und Komparatistik

Logo des Lehrstuhls für Korpus- und Computerlinguistik

Navigation Navigation close
  • Research
    • Methodological foundations of corpus research and digital humanities
    • Corpus tools and language technology
    • Collocations, multiword expressions and corpus-based discourse analysis
    • Further research
    • All publications
    Research
  • Projects
    • RC21
    • PING
    • NormRechts
    • LeAK & AnGer
    • Past Projects
    Projects
  • Resources
    • Corpus Access
    • Web Apps
    • Software & Data
    Resources
  • Teaching
    • Informationen für Erstsemester
    • Rund um den Studiengang
    • Lehrveranstaltungen
    • Oberseminar CL
    • CIP-Pool und Bibliothek
    • FSI Computerlinguistik
    • Arbeiten am Lehrstuhl
    Teaching
  • Team
    • Lead
    • Administrative Office
    • Research Assistants
    Team
  • Blog
  1. Home
  2. Projects
  3. LeAK & AnGer
  4. Results

Results

In page navigation: Projects
  • RC21
  • LeAK & AnGer
    • Team
    • Publications
    • Resources
    • Results
  • NormRechts
  • PING
  • Past Projects

Results

Initial results with Amtsgericht (AG) data

We fine-tuned GottBERT, a German pretrained language model, using this corpus and got a F1=84% for all anonymised data points on our test set which indicates great potential for anonymisation. Especially, we could reach a Recall of 96% for PII (personal identifiable information) text spans

  Text spans
Recall according to risk
  Precision Recall F1 High
Medium
Low
GottBERT 0.80 0.90 0.84 0.96 0.80 0.89

In order to get more robustness, we also built a joint learning model which is trained to solve three different tasks simultaneously using the same corpus, namely, spans detection, entity classification and risk prediction. We achieved a Recall of around 98.8% for PII text spans and a total F1=96.56% (Precision=96.88, Recall=96.24) across all AG domains.

Current cross-domain results with Oberlandesgericht (OLG) data

To evaluate the generalisability of the multitask model, we ran additional evaluations using verdicts from 10 law domains from higher regional courts (Oberlandesgericht). Recall values for PII text spans show that our multitask model (trained with AG data only) could already detect most of the high risk text spans for all OLG domains except for Immaterialgüter. Additionally, we developed an AG+OLG combined model by using both datasets and received substantial improvements in all OLG domains compared to initial cross-domain validation. The table below also shows numbers of training tokens, documents within each law domain in our OLG training samples.

OLG domains Precision Recall Recall (PII) Recall AG+OLG Recall AG+OLG (PII) n Training Tokens (OLG)
n Training Documents (OLG)
n Tokens/Document (OLG)
Allg. Zivilsachen 89.22 93.61 97.91 96.26 99.30 112300 42 2673,80
Bankensachen 92.89 93.83 100.0 96.51 100.00 49530  26 1905
Bausachen 94.48 97.13 94.86 98.16 96.37 59715 18 3317,5
Beschwerdeverfahren 80.58 95.91 93.83 95.57 98.77 42872  18 2381,77
Familiensachen 86.15 92.63 95.40 94.02 98.28 37691  17 2217,11
Handelssachen 84.90 95.08 99.07 97.63 100.00 109228  24 4551,16
Immaterialgüter 78.65 77.36 83.90 83.11 87.80 87367  21 4160,33
Kostensachen 85.53  94.67 100.0 98.00 100.00 11028  8 1378,5
Schiedssachen 90.24 85.84 97.87 94.56 98.94 35399  12 2949,91
Verkehrsunfallsachen 86.02  85.35 95.88 89.90 98.24 63300  24 2637,5

Learning curves

The graph below describes the anonymisation performance with different data sizes and suggests some room for improvements, as the curves continue to rise steadily, especially finetuning an OLG model still requires more data since the best Recall is just around 93% for all LLMs we used during the experiments. This once again illustrates the importance of domain adaptation.

Learning curves – Results of fine-tuned anonymisation models with different data sizes

For further details about our experiments, please consider taking a look at the following blog posts:

Blog LeAK – https://www.linguistik.phil.fau.de/2024/02/07/automatic-anonymisation-of-court-decisions/

Blog LeAK 2 – https://www.linguistik.phil.fau.de/2024/02/07/leak-2-automatic-anonymisation-experiments/

Computational Corpus Linguistics
Prof. Dr. Stephanie Evert

Bismarckstraße 6
91054 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • RSS Feed
  • Twitter
  • YouTube
Up