Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification

Abstract
Abstract (translated)
URL
PDF

Abstract

Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when reporting anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU WBD, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other NE labels) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a LLM that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool's effectiveness using court cases from the ECHR and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution (AA) attacks and utility loss statistically using the popular IMDb62 movie reviews dataset. Our method can significantly reduce AA accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content's semantics.

Abstract (translated)

举报举报对于确保公共和私营部门的高透明度和问责制至关重要。然而，(可能的)举报者经常担心或面临报复，即使他们匿名举报。他们举报内容的具体内容和独特的写作风格可能会使他们重新被识别为来源。法律措施，如欧盟举报机制(WBD)，在范围和效果上有限。因此，计算方法防止重新识别是鼓励举报者举报的重要补充工具。然而，当前的文本消毒工具遵循一种一刀切的方法，对匿名性持过于狭隘的观点。它们试图通过用典型高风险词汇（如人物姓名和其他标签）替换来减轻识别风险。然而，这种方法在举报场景中是不够的，因为它忽略了文本特征中进一步的重新识别可能性，包括文体。因此，我们提出了一个新颖的分类和减轻策略，该策略让举报者在评估风险和效用时参与其中。我们的原型工具在词/短语级别评估风险，并应用风险适应的匿名化技术生成语法不连贯但适度消毒的文本。然后，我们使用一个我们微调用于复写的LLM来生成连贯且风格中性的文本。我们用IMDb62电影评论数据集中的片段来衡量我们工具的有效性，并评估其对作者归属攻击和效用损失的统计保护。我们的方法可以从98.81%的准确度降低到31.22%，同时保留原始内容的73.1%的语义。

URL

https://arxiv.org/abs/2405.01097

PDF

https://arxiv.org/pdf/2405.01097.pdf

Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification

Abstract

Abstract (translated)

URL

PDF Copy

PDF