Abstract
A problem with many current Large Language Model (LLM) driven spoken dialogues is the response time. Some efforts such as Groq address this issue by lightning fast processing of the LLM, but we know from the cognitive psychology literature that in human-to-human dialogue often responses occur prior to the speaker completing their utterance. No amount of delay for LLM processing is acceptable if we wish to maintain human dialogue latencies. In this paper, we discuss methods for understanding an utterance in close to real time and generating a response so that the system can comply with human-level conversational turn delays. This means that the information content of the final part of the speaker's utterance is lost to the LLM. Using the Google NaturalQuestions (NQ) database, our results show GPT-4 can effectively fill in missing context from a dropped word at the end of a question over 60% of the time. We also provide some examples of utterances and the impacts of this information loss on the quality of LLM response in the context of an avatar that is currently under development. These results indicate that a simple classifier could be used to determine whether a question is semantically complete, or requires a filler phrase to allow a response to be generated within human dialogue time constraints.
Abstract (translated)
许多当前的 Large Language Model (LLM) 驱动的会话存在一个响应时间的问题。一些努力(如Groq)通过闪电般的处理LLM解决了这个问题,但根据认知心理学文献,人类之间的对话中,响应通常在说话人完成其陈述之前发生。如果我们希望保持人类对话的延迟,对于LLM处理过程中的任何延迟都是不可接受的。在本文中,我们讨论了在接近实时理解和生成响应以使系统符合人类级的会话轮次延迟的方法。这意味着说话人最后部分的话语内容的最终部分将被LLM丢失。使用谷歌自然问题(NQ)数据库,我们的结果表明,GPT-4在超过60%的时间内可以有效地填补问题中的单词末尾丢失的上下文。我们还提供了一些例子,以及这种信息丢失对正在开发中的虚拟助手中的 LLM 响应质量的影响。这些结果表明,一个简单的分类器可以用来确定一个问题是否具有语义完整性,或者是否需要填充短语以便在人类对话时间内生成响应。
URL
https://arxiv.org/abs/2404.16053