Abstract
Forensic scientists often need to identify an unknown speaker or writer in cases such as ransom calls, covert recordings, alleged suicide notes, or anonymous online communications, among many others. Speaker recognition in the speech domain usually examines phonetic or acoustic properties of a voice, and these methods can be accurate and robust under certain conditions. However, if a speaker disguises their voice or employs text-to-speech software, vocal properties may no longer be reliable, leaving only their linguistic content available for analysis. Authorship attribution methods traditionally use syntactic, semantic, and related linguistic information to identify writers of written text (authorship attribution). In this paper, we apply a content-based authorship approach to speech that has been transcribed into text, using what a speaker says to attribute speech to individuals (speaker attribution). We introduce a stylometric method, StyloSpeaker, which incorporates character, word, token, sentence, and style features from the stylometric literature on authorship, to assess whether two transcripts were produced by the same speaker. We evaluate this method on two types of transcript formatting: one approximating prescriptive written text with capitalization and punctuation and another normalized style that removes these conventions. The transcripts' conversation topics are also controlled to varying degrees. We find generally higher attribution performance on normalized transcripts, except under the strongest topic control condition, in which overall performance is highest. Finally, we compare this more explainable stylometric model to black-box neural approaches on the same data and investigate which stylistic features most effectively distinguish speakers.
Abstract (translated)
法医科学家经常需要在诸如赎金电话、秘密录音、疑似遗书或匿名在线通信等案件中识别未知的说话人或写作者。语音领域中的说话人识别通常会考察声音的发音或声学属性,这些方法在一定条件下可以达到较高的准确性和鲁棒性。然而,如果说话者伪装了自己的声音或者使用了文字转语音软件,那么仅靠音质特征可能不再可靠,此时只剩下语言内容可供分析。 传统上,写作者身份识别的方法通常利用句法、语义及相关语言信息来确认书写文本的作者(即作者归属)。在这篇论文中,我们将基于内容的作者归属方法应用于已转换成文本的语音,使用说话者所说的内容来进行语音归因。我们介绍了一种文体测量学方法StyloSpeaker,该方法结合了来自作者归属文体学文献中的字符、单词、词元、句子和风格特征,以评估两份转写记录是否由同一人产生。我们在两种类型的转写格式上对这种方法进行了测试:一种是近似规范书面文本的格式(包括大小写和标点符号),另一种则是去除了这些约定的形式化风格。同时控制了转写记录的主题内容。 我们发现,在除最强的主题控制条件之外,标准化转录的表现普遍较好;而在最强主题控制条件下,整体表现最高。最后,我们将这种更具解释性的文体测量模型与黑盒神经网络方法进行了比较,并探讨哪些文体特征最有效地区分说话人。
URL
https://arxiv.org/abs/2512.13667