Abstract
Our proposed method, ReSeTOX (REdo SEarch if TOXic), addresses the issue of Neural Machine Translation (NMT) generating translation outputs that contain toxic words not present in the input. The objective is to mitigate the introduction of toxic language without the need for re-training. In the case of identified added toxicity during the inference process, ReSeTOX dynamically adjusts the key-value self-attention weights and re-evaluates the beam search hypotheses. Experimental results demonstrate that ReSeTOX achieves a remarkable 57% reduction in added toxicity while maintaining an average translation quality of 99.5% across 164 languages.
Abstract (translated)
我们提出的方法和 ReSeTOX(REdo SEarch if TOXic)解决了神经网络机器翻译(NMT)生成翻译输出中含有输入中没有的有害词汇的问题。目标是在不需要重新训练的情况下减少引入有害语言的数量。如果在推理过程中发现增加了有害词汇,ReSeTOX会动态调整关键值自注意力权重并重新评估束搜索假设。实验结果显示,ReSeTOX实现了令人惊奇的57%的有害语言减少,同时保持了164种语言的翻译平均质量99.5%。
URL
https://arxiv.org/abs/2305.11761