Abstract
The recent advancements in Deep Learning models and techniques have led to significant strides in performance across diverse tasks and modalities. However, while the overall capabilities of models show promising growth, our understanding of their internal reasoning processes remains limited, particularly concerning systematic inconsistencies or errors patterns of logical or inferential flaws. These inconsistencies may manifest as contradictory outputs, failure to generalize across similar tasks, or erroneous conclusions in specific contexts. Even detecting and measuring such reasoning discrepancies is challenging, as they may arise from opaque internal procedures, biases and imbalances in training data, or the inherent complexity of the task. Without effective methods to detect, measure, and mitigate these errors, there is a risk of deploying models that are biased, exploitable, or logically unreliable. This thesis aims to address these issues by producing novel methods for deep learning models that reason over knowledge graphs, natural language, and images. The thesis contributes two techniques for detecting and quantifying predictive inconsistencies originating from opaque internal procedures in natural language and image processing models. To mitigate inconsistencies from biases in training data, this thesis presents a data efficient sampling method to improve fairness and performance and a synthetic dataset generation approach in low resource scenarios. Finally, the thesis offers two techniques to optimize the models for complex reasoning tasks. These methods enhance model performance while allowing for more faithful and interpretable exploration and exploitation during inference. Critically, this thesis provides a comprehensive framework to improve the robustness, fairness, and interpretability of deep learning models across diverse tasks and modalities.
Abstract (translated)
最近在深度学习模型和技术上的进展,已经在各种任务和模态中显著提升了性能。然而,尽管整体模型能力显示出了令人鼓舞的增长趋势,我们对其内部推理过程的理解仍然有限,尤其是关于系统性不一致或逻辑或推断错误模式的了解更是如此。这些不一致性可能表现为矛盾的结果、无法在类似任务之间泛化或者在特定情境下的错误结论。检测和衡量这样的推理差异也颇具挑战,因为它们可能是由于模型内部复杂的操作程序、训练数据中的偏见与不平衡,或是任务本身的固有复杂性所导致的。如果没有有效的方法来检测、测量并缓解这些错误,那么部署的模型可能会存在偏见、可被利用或者逻辑上不可靠的风险。 本论文旨在通过为处理知识图谱、自然语言和图像的深度学习模型开发新的方法来解决这些问题。具体而言,本文贡献了两种针对自然语言与图像处理模型中源自不透明内部过程的预测不一致进行检测和量化的新技术。为了缓解训练数据偏见所导致的不一致性问题,本论文提出了一种效率高的采样方法以提升公平性和性能,并在资源匮乏的情况下提供了一套合成数据集生成方案。最后,本文提供了两种优化复杂推理任务模型的方法。这些方法提升了模型的表现力,同时在推断过程中促进了更加忠实和可解释性更强的探索与利用。 关键的是,本论文为提高深度学习模型在各种任务和模态下的鲁棒性、公平性和可解释性提供了一个全面框架。
URL
https://arxiv.org/abs/2504.02577