The rapid advancement of generative Artificial Intelligence (AI) technologies, particularly Generative Pre-trained Transformer (GPT) models such as ChatGPT, has the potential to significantly impact cybersecurity. In this study, we investigated the impact of GPTs, specifically ChatGPT, on tertiary education in cybersecurity, and provided recommendations for universities to adapt their curricula to meet the evolving needs of the industry. Our research highlighted the importance of understanding the alignment between GPT's ``mental model'' and human cognition, as well as the enhancement of GPT capabilities to human skills based on Bloom's taxonomy. By analyzing current educational practices and the alignment of curricula with industry requirements, we concluded that universities providing practical degrees like cybersecurity should align closely with industry demand and embrace the inevitable generative AI revolution, while applying stringent ethics oversight to safeguard responsible GPT usage. We proposed a set of recommendations focused on updating university curricula, promoting agility within universities, fostering collaboration between academia, industry, and policymakers, and evaluating and assessing educational outcomes.
随着生成人工智能(AI)技术的快速发展,特别是像ChatGPT这样的生成预训练Transformer(GPT)模型,对网络安全领域的影响可能会显著。在这项研究中,我们调查了GPT对网络安全层次教育的影響,并为大学提供了适应不断变化行业需求的建议。我们的研究突出了了解GPT的“智能模型”与人类认知之间的协调以及通过Bloom's taxonomy增强GPT能力的重要性。通过分析当前的教育制度和课程与行业需求的协调,我们得出结论,提供实用学位的大学应与行业需求保持紧密一致,并接受必然要到来的生成AI革命,同时对GPT的使用进行严格的伦理监督,以保障负责任的GPT使用。我们提出了一个建议集,重点关注更新大学课程、促进大学内部灵活性、促进学术界、行业和政策制定者之间的合作,以及评估和评估教育成果。
https://arxiv.org/abs/2403.11402
Causality has become a fundamental approach for explaining the relationships between events, phenomena, and outcomes in various fields of study. It has invaded various fields and applications, such as medicine, healthcare, economics, finance, fraud detection, cybersecurity, education, public policy, recommender systems, anomaly detection, robotics, control, sociology, marketing, and advertising. In this paper, we survey its development over the past five decades, shedding light on the differences between causality and other approaches, as well as the preconditions for using it. Furthermore, the paper illustrates how causality interacts with new approaches such as Artificial Intelligence (AI), Generative AI (GAI), Machine and Deep Learning, Reinforcement Learning (RL), and Fuzzy Logic. We study the impact of causality on various fields, its contribution, and its interaction with state-of-the-art approaches. Additionally, the paper exemplifies the trustworthiness and explainability of causality models. We offer several ways to evaluate causality models and discuss future directions.
因果关系(Causality)已经成为解释各种研究领域中事件、现象和结果之间关系的固有方法。它已经侵入了许多领域和应用,如医学、公共卫生、经济学、金融、欺诈检测、网络安全、教育、公共政策、推荐系统、异常检测、机器人学、控制、社会学、市场营销和广告。在本文中,我们回顾了因果关系在过去的50年中的发展,阐明了因果关系与其他方法之间的差异以及使用它的前提条件。此外,本文还说明了因果关系如何与人工智能(AI)、生成式人工智能(GAI)、机器学习和深度学习、强化学习(RL)和模糊逻辑等新方法相互作用。我们研究了因果关系对各个领域的影响、其贡献以及与最先进方法的相互作用。此外,本文还阐明了因果关系模型的可靠性和可解释性。我们提出了几种评估因果关系模型的方法,并讨论了未来的研究方向。
https://arxiv.org/abs/2403.11219
This paper presents a comprehensive examination of the impact of tokenization strategies and vocabulary sizes on the performance of Arabic language models in downstream natural language processing tasks. Our investigation focused on the effectiveness of four tokenizers across various tasks, including News Classification, Hate Speech Detection, Sentiment Analysis, and Natural Language Inference. Leveraging a diverse set of vocabulary sizes, we scrutinize the intricate interplay between tokenization approaches and model performance. The results reveal that Byte Pair Encoding (BPE) with Farasa outperforms other strategies in multiple tasks, underscoring the significance of morphological analysis in capturing the nuances of the Arabic language. However, challenges arise in sentiment analysis, where dialect specific segmentation issues impact model efficiency. Computational efficiency analysis demonstrates the stability of BPE with Farasa, suggesting its practical viability. Our study uncovers limited impacts of vocabulary size on model performance while keeping the model size unchanged. This is challenging the established beliefs about the relationship between vocabulary, model size, and downstream tasks, emphasizing the need for the study of models' size and their corresponding vocabulary size to generalize across domains and mitigate biases, particularly in dialect based datasets. Paper's recommendations include refining tokenization strategies to address dialect challenges, enhancing model robustness across diverse linguistic contexts, and expanding datasets to encompass the rich dialect based Arabic. This work not only advances our understanding of Arabic language models but also lays the foundation for responsible and ethical developments in natural language processing technologies tailored to the intricacies of the Arabic language.
本文对词标化策略和词汇大小对下游自然语言处理任务中阿拉伯语模型的性能影响进行全面评估。我们的研究集中在四种词标化策略在各种任务上的效果,包括新闻分类、仇恨言论检测、情感分析和自然语言推理。利用丰富的词汇规模,我们深入探讨了词标化方法和模型性能之间的复杂交互。结果显示,Byte Pair Encoding(BPE)与Farasa在多个任务上优于其他策略,凸显了形式分析在捕捉阿拉伯语言细微差别中的重要性。然而,在情感分析中,方言特定的片段划分问题影响了模型的效率。计算效率分析表明,BPE与Farasa的稳定性得到了加强,表明其具有实际可行性。我们的研究揭示了词汇规模对模型性能的影响有限,同时保持模型规模不变。这挑战了词汇、模型大小和下游任务之间的关系,强调了在方言数据集上对模型大小和相应词汇大小的研究至关重要。本文的建议包括优化词标化策略以应对方言挑战,增强模型在不同语言环境下的鲁棒性,并将数据集扩展到涵盖丰富方言的阿拉伯语。这项工作不仅推动了我们对阿拉伯语语言模型的理解,也为针对阿拉伯语语言的复杂性进行 Responsible and Ethical developments in natural language processing technologies奠定了基础。
https://arxiv.org/abs/2403.11130
Developing a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater to personalized user preferences. However, the predominant personalization approaches mainly focus on the ID or text-based recommendation problem, failing to comprehend the information spanning various tasks or modalities. In this paper, our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP), which effectively leverages multi-modal data while eliminating the complexities associated with task- and modality-specific customization. We argue that the advancements in foundational generative modeling have provided the flexibility and effectiveness necessary to achieve the objective. In light of this, we develop a generic and extensible personalization generative framework, that can handle a wide range of personalized needs including item recommendation, product search, preference prediction, explanation generation, and further user-guided image generation. Our methodology enhances the capabilities of foundational language models for personalized tasks by seamlessly ingesting interleaved cross-modal user history information, ensuring a more precise and customized experience for users. To train and evaluate the proposed multi-modal personalized tasks, we also introduce a novel and comprehensive benchmark covering a variety of user requirements. Our experiments on the real-world benchmark showcase the model's potential, outperforming competitive methods specialized for each task.
开发一个可以有效利用异质资源并针对各种个性化需求作出回应的通用模型,一直是社区长期的目标。我们日常的选择,尤其是在时尚和零售等领域,很大程度上是由多模态数据(如图片和文本描述)塑造的。这些模态不仅提供直观的指导,而且迎合了个性化的用户偏好。然而,主要的个性化方法主要集中在基于ID或文本的推荐问题,未能理解各种任务或模态的信息。在本文中,我们的目标是为多模态个性化系统(UniMP)建立一个统一范式,有效地利用多模态数据,同时消除与任务和模态特定定制相关的复杂性。我们认为,基础生成模型的进步为实现这一目标提供了所需的灵活性和有效性。因此,我们开发了一个通用且可扩展的个人化生成框架,可以处理包括物品推荐、产品搜索、偏好预测、解释生成和进一步的用户指导图像生成在内的各种个性化需求。通过无缝摄入跨模态用户历史信息,确保为用户带来更精确和定制化的体验,从而增强基础语言模型在个性化任务上的能力。为了训练和评估所提出的多模态个性化任务,我们还引入了一个新的全面基准,覆盖了各种用户需求。我们对现实世界基准的实验表明,该模型具有巨大的潜力,超越了每个任务的竞争方法。
https://arxiv.org/abs/2403.10667
Large language models (LLMs) have shown excellent performance on various NLP tasks. To use LLMs as strong sequential recommenders, we explore the in-context learning approach to sequential recommendation. We investigate the effects of instruction format, task consistency, demonstration selection, and number of demonstrations. As increasing the number of demonstrations in ICL does not improve accuracy despite using a long prompt, we propose a novel method called LLMSRec-Syn that incorporates multiple demonstration users into one aggregated demonstration. Our experiments on three recommendation datasets show that LLMSRec-Syn outperforms state-of-the-art LLM-based sequential recommendation methods. In some cases, LLMSRec-Syn can perform on par with or even better than supervised learning methods. Our code is publicly available at this https URL.
大语言模型(LLMs)在各种自然语言处理任务上都表现出优异的性能。为了将LLMs用作强大的序列推荐器,我们研究了在序列推荐中使用LLM的上下文学习方法。我们调查了指令格式、任务一致性、演示选择和演示数量的影响。尽管增加ICL中的演示数量并没有提高准确性,但即使使用了长提示,LLM-based序列推荐方法的表现仍然不佳,我们提出了一种名为LLMSRec-Syn的新方法,将多个演示用户集成到一个聚合演示中。我们对三个推荐数据集的实验结果表明,LLMSRec-Syn在LLM-based序列推荐方法之外的表现优异。在某些情况下,LLMSRec-Syn的表现甚至比监督学习方法更好。我们的代码可在此处公开访问:https://www.llm-rec.com/。
https://arxiv.org/abs/2403.10135
Click-through rate (CTR) prediction is a core task in recommender systems. Existing methods (IDRec for short) rely on unique identities to represent distinct users and items that have prevailed for decades. On one hand, IDRec often faces significant performance degradation on cold-start problem; on the other hand, IDRec cannot use longer training data due to constraints imposed by iteration efficiency. Most prior studies alleviate the above problems by introducing pre-trained knowledge(e.g. pre-trained user model or multi-modal embeddings). However, the explosive growth of online latency can be attributed to the huge parameters in the pre-trained model. Therefore, most of them cannot employ the unified model of end-to-end training with IDRec in industrial recommender systems, thus limiting the potential of the pre-trained model. To this end, we propose a $\textbf{P}$re-trained $\textbf{P}$lug-in CTR $\textbf{M}$odel, namely PPM. PPM employs multi-modal features as input and utilizes large-scale data for pre-training. Then, PPM is plugged in IDRec model to enhance unified model's performance and iteration efficiency. Upon incorporating IDRec model, certain intermediate results within the network are cached, with only a subset of the parameters participating in training and serving. Hence, our approach can successfully deploy an end-to-end model without causing huge latency increases. Comprehensive offline experiments and online A/B testing at JD E-commerce demonstrate the efficiency and effectiveness of PPM.
点击率(CTR)预测是推荐系统中的核心任务。现有的方法(IDRec,简称)依赖于唯一的标识来表示了几十年中取得成功的用户和物品。一方面,IDRec在冷启动问题上常常面临显著的性能下降;另一方面,由于迭代效率的限制,IDRec无法使用更长的训练数据。大多数先前的研究通过引入预训练知识(例如预训练用户模型或多模态嵌入)来解决上述问题。然而,在线延迟的爆炸性增长可以归因于预训练模型中的巨大参数。因此,大多数它们无法在工业推荐系统中使用端到端训练的统一模型,从而限制了预训练模型的潜力。为此,我们提出了一个预训练的plug-in CTR模型,即PPM。PPM采用多模态特征作为输入,并利用大规模数据进行预训练。然后将PPM插入到IDRec模型中,以提高统一模型的性能和迭代效率。在纳入IDRec模型后,网络中的某些中间结果在只有部分参数参与训练并服务的同时被缓存。因此,我们的方法可以在不导致巨大延迟增加的情况下成功部署端到端模型。全面的离线实验和在线A/B测试在JD电子商务中证明了PPM的高效和有效性。
https://arxiv.org/abs/2403.10049
Self-supervised learning (SSL) has recently emerged as a powerful approach to learning representations from large-scale unlabeled data, showing promising results in time series analysis. The self-supervised representation learning can be categorized into two mainstream: contrastive and generative. In this paper, we will present a comprehensive comparative study between contrastive and generative methods in time series. We first introduce the basic frameworks for contrastive and generative SSL, respectively, and discuss how to obtain the supervision signal that guides the model optimization. We then implement classical algorithms (SimCLR vs. MAE) for each type and conduct a comparative analysis in fair settings. Our results provide insights into the strengths and weaknesses of each approach and offer practical recommendations for choosing suitable SSL methods. We also discuss the implications of our findings for the broader field of representation learning and propose future research directions. All the code and data are released at \url{this https URL}.
自监督学习(SSL)作为一种从大规模无标签数据中学习表示的强大方法,在时间序列分析中取得了良好的结果。自监督表示学习可以分为两种主流:对比学习和生成学习。在本文中,我们将对时间序列中对比学习和生成学习方法的全面比较进行介绍。我们首先介绍对比学习和生成学习的基本框架,并讨论如何获得指导模型优化的监督信号。然后,我们为每种类型实现经典的算法(SimCLR vs. MAE)并进行公平设置的比较分析。我们的结果提供了关于每种方法的优劣之处的洞察,并为选择合适的SSL方法提供了实际建议。我们还讨论了我们的发现的对表示学习领域更广泛范围的意义,并提出了未来的研究方向。所有代码和数据都已发布在 \url{这个链接}。
https://arxiv.org/abs/2403.09809
Large Language Models (LLMs) are poised to play an increasingly important role in our lives, providing assistance across a wide array of tasks. In the geospatial domain, LLMs have demonstrated the ability to answer generic questions, such as identifying a country's capital; nonetheless, their utility is hindered when it comes to answering fine-grained questions about specific places, such as grocery stores or restaurants, which constitute essential aspects of people's everyday lives. This is mainly because the places in our cities haven't been systematically fed into LLMs, so as to understand and memorize them. This study introduces a novel framework for fine-tuning a pre-trained model on city-specific data, to enable it to provide accurate recommendations, while minimizing hallucinations. We share our model, LAMP, and the data used to train it. We conduct experiments to analyze its ability to correctly retrieving spatial objects, and compare it to well-known open- and closed- source language models, such as GPT-4. Finally, we explore its emerging capabilities through a case study on day planning.
大语言模型(LLMs)在我们生活中扮演着越来越重要的角色,跨越各种任务提供帮助。在地理空间领域,LLMs已经展示了回答通用问题的能力,例如识别一个国家的首都。然而,当涉及到回答关于特定地点的细粒度问题时,LLMs的实用性就受到限制了,这些地点通常是人们日常生活的重要组成部分,如商店或餐厅。这主要是因为我们城市的地点没有系统地输入到LLMs中,以便理解和记忆它们。 本研究介绍了一种新的框架,用于在特定城市数据上对预训练模型进行微调,以使其能够提供准确的推荐,同时最小化偏差。我们分享我们的模型LAMP和用于训练它的数据。我们进行了实验,以分析其正确检索空间对象的能力,并将其与已知 open- 和 closed- source 语言模型(如 GPT-4)进行比较。最后,我们通过一个关于日计划的案例研究探讨了其新兴功能。
https://arxiv.org/abs/2403.09059
In the competitive realm of sports, optimal performance necessitates rigorous management of nutrition and physical conditioning. Specifically, in badminton, the agility and precision required make it an ideal candidate for motion analysis through video analytics. This study leverages advanced neural network methodologies to dissect video footage of badminton matches, aiming to extract detailed insights into player kinetics and biomechanics. Through the analysis of stroke mechanics, including hand-hip coordination, leg positioning, and the execution angles of strokes, the research aims to derive predictive models that can suggest improvements in stance, technique, and muscle orientation. These recommendations are designed to mitigate erroneous techniques, reduce the risk of joint fatigue, and enhance overall performance. Utilizing a vast array of data available online, this research correlates players' physical attributes with their in-game movements to identify muscle activation patterns during play. The goal is to offer personalized training and nutrition strategies that align with the specific biomechanical demands of badminton, thereby facilitating targeted performance enhancements.
在体育竞技领域,卓越的表现需要对营养和体格进行严格的管理。具体来说,在羽毛球中,敏捷性和精准度要求使得它成为通过视频分析进行运动分析的绝佳选择。本研究利用先进的神经网络方法来解析羽毛球比赛的视频录像,旨在提取关于运动员动力学和生物力学方面的详细见解。通过分析击球技术,包括手肘协调、腿部姿势和击球角度,研究旨在得出可以建议提高站姿、技术和肌肉朝向的预测模型。这些建议旨在减少错误的技术,降低关节疲劳的风险,并提高整体表现。利用互联网上大量数据,本研究将运动员的体格与比赛动作相关联,以识别在比赛中的肌肉激活模式。目标是提供符合羽毛球特定生物力学需求的个性化训练和营养策略,从而促进针对性的表现提升。
https://arxiv.org/abs/2403.08956
Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems. Large language models show promise in simulating human-like behavior, raising the question of their ability to represent a diverse population of users. We introduce a new protocol to measure the degree to which language models can accurately emulate human behavior in conversational recommendation. This protocol is comprised of five tasks, each designed to evaluate a key property that a synthetic user should exhibit: choosing which items to talk about, expressing binary preferences, expressing open-ended preferences, requesting recommendations, and giving feedback. Through evaluation of baseline simulators, we demonstrate these tasks effectively reveal deviations of language models from human behavior, and offer insights on how to reduce the deviations with model selection and prompting strategies.
合成用户是评估会话推荐系统中的真实用户的有成本效益的代理。大型语言模型在模拟人类行为方面表现出前景,这引发了它们是否能够代表多样用户群体的问题。我们引入了一种新的协议,以测量语言模型在会话中准确模仿人类行为的程度。这个协议由五个任务组成,每个任务旨在评估合成用户应表现出的一些关键特征:选择要谈论的物品、表达二进制偏好、表达开放性偏好、请求建议和给予反馈。通过评估基线模拟器,我们有效地揭示了语言模型在行为上的偏差,并为如何通过模型选择和提示策略减少偏差提供了见解。
https://arxiv.org/abs/2403.09738
Existing Machine Learning approaches for local citation recommendation directly map or translate a query, which is typically a claim or an entity mention, to citation-worthy research papers. Within such a formulation, it is challenging to pinpoint why one should cite a specific research paper for a particular query, leading to limited recommendation interpretability. To alleviate this, we introduce the evidence-grounded local citation recommendation task, where the target latent space comprises evidence spans for recommending specific papers. Using a distantly-supervised evidence retrieval and multi-step re-ranking framework, our proposed system, ILCiteR, recommends papers to cite for a query grounded on similar evidence spans extracted from the existing research literature. Unlike past formulations that simply output recommendations, ILCiteR retrieves ranked lists of evidence span and recommended paper pairs. Secondly, previously proposed neural models for citation recommendation require expensive training on massive labeled data, ideally after every significant update to the pool of candidate papers. In contrast, ILCiteR relies solely on distant supervision from a dynamic evidence database and pre-trained Transformer-based Language Models without any model training. We contribute a novel dataset for the evidence-grounded local citation recommendation task and demonstrate the efficacy of our proposed conditional neural rank-ensembling approach for re-ranking evidence spans.
现有的机器学习方法为局部引用推荐,直接将查询(通常是主张或实体 mention)映射或转换为值得引用的研究论文。在这种表述中,很难确定为特定查询引用哪个研究论文,导致推荐解释性有限。为了缓解这个问题,我们引入了基于证据的局部引用推荐任务,其中目标潜在空间包括推荐具体论文的证据范围。使用距离监督的证据检索和多步重排框架,我们提出的系统 ILCiteR,根据从现有研究文献中提取的类似证据范围推荐论文。与过去的表述不同,ILCiteR不仅输出推荐,还检索排名证据范围和推荐论文对。其次,之前提出的用于引用推荐的神经模型需要对大规模标记数据进行昂贵的训练,理想情况下在更新候选论文池之后。相反,ILCiteR仅依赖于来自动态证据数据库的距离监督和预训练的Transformer-based语言模型,无需模型训练。我们为证据为基础的局部引用推荐任务引入了一个新的数据集,并证明了我们提出的条件神经排名ensembling方法对重新排名证据范围的有效性。
https://arxiv.org/abs/2403.08737
The exponential growth in scale and relevance of social networks enable them to provide expansive insights. Predicting missing links in social networks efficiently can help in various modern-day business applications ranging from generating recommendations to influence analysis. Several categories of solutions exist for the same. Here, we explore various feature extraction techniques to generate representations of nodes and edges in a social network that allow us to predict missing links. We compare the results of using ten feature extraction techniques categorized across Structural embeddings, Neighborhood-based embeddings, Graph Neural Networks, and Graph Heuristics, followed by modeling with ensemble classifiers and custom Neural Networks. Further, we propose combining heuristic-based features and learned representations that demonstrate improved performance for the link prediction task on social network datasets. Using this method to generate accurate recommendations for many applications is a matter of further study that appears very promising. The code for all the experiments has been made public.
社交网络的规模和相关性的指数增长使它们能够提供广泛的见解。预测社交网络中的缺失链接有效地可以帮助实现各种现代业务应用,从生成推荐到影响力分析。相同主题下存在几种解决方案。在这里,我们探讨了各种用于提取社交网络节点和边的特征的技术,以便预测缺失链接。我们比较了使用十个特征提取技术(包括结构化嵌入、基于邻居嵌入、图神经网络和图启发式)进行建模的结果,然后用集成分类器和自定义神经网络进行建模。此外,我们提出了结合启发式特征和学习表示的策略,展示了在社交网络数据集上提高链路预测任务性能。使用这种方法为许多应用生成准确的推荐是一个需要进一步研究的问题,看上去非常有前途。所有实验的代码都已公开发布。
https://arxiv.org/abs/2403.08613
Large Language Models (LLMs) have shown impressive capabilities in generating human-like responses. However, their lack of domain-specific knowledge limits their applicability in healthcare settings, where contextual and comprehensive responses are vital. To address this challenge and enable the generation of patient-centric responses that are contextually relevant and comprehensive, we propose MedInsight:a novel retrieval augmented framework that augments LLM inputs (prompts) with relevant background information from multiple sources. MedInsight extracts pertinent details from the patient's medical record or consultation transcript. It then integrates information from authoritative medical textbooks and curated web resources based on the patient's health history and condition. By constructing an augmented context combining the patient's record with relevant medical knowledge, MedInsight generates enriched, patient-specific responses tailored for healthcare applications such as diagnosis, treatment recommendations, or patient education. Experiments on the MTSamples dataset validate MedInsight's effectiveness in generating contextually appropriate medical responses. Quantitative evaluation using the Ragas metric and TruLens for answer similarity and answer correctness demonstrates the model's efficacy. Furthermore, human evaluation studies involving Subject Matter Expert (SMEs) confirm MedInsight's utility, with moderate inter-rater agreement on the relevance and correctness of the generated responses.
大语言模型(LLMs)在生成人类似回应方面表现出了令人印象深刻的能力。然而,由于它们缺乏领域特定的知识,它们在医疗保健场景中的应用受到了限制,在那里上下文和全面回应至关重要。为了应对这个挑战,并实现生成关注患者需求的回应,我们提出了MedInsight:一种新颖的检索增强框架,通过从多个来源获取相关背景信息来扩展LLM输入(提示)。MedInsight从患者的病历或会诊录音中提取相关信息。然后,根据患者的病史和健康状况,整合权威医学教科书和精心策划的网站资源。通过将患者病历与相关医疗知识构建的增强上下文相结合,MedInsight生成定制化的、针对医疗应用的丰富回应,例如诊断、治疗建议或患者教育。在MTSamples数据集上的实验证实了MedInsight在生成适应当前医疗环境的回应方面的有效性。使用Ragas指标和TruLens进行答案相似度和正确性的定量评估,证明了该模型的有效性。此外,涉及主题专家(SMEs)的人际评价研究证实了MedInsight的实用性,适度的人际间一致性在答案的关联性和正确性方面。
https://arxiv.org/abs/2403.08607
While code review is central to the software development process, it can be tedious and expensive to carry out. In this paper, we investigate whether and how Large Language Models (LLMs) can aid with code reviews. Our investigation focuses on two tasks that we argue are fundamental to good reviews: (i) flagging code with security vulnerabilities and (ii) performing software functionality validation, i.e., ensuring that code meets its intended functionality. To test performance on both tasks, we use zero-shot and chain-of-thought prompting to obtain final ``approve or reject'' recommendations. As data, we employ seminal code generation datasets (HumanEval and MBPP) along with expert-written code snippets with security vulnerabilities from the Common Weakness Enumeration (CWE). Our experiments consider a mixture of three proprietary models from OpenAI and smaller open-source LLMs. We find that the former outperforms the latter by a large margin. Motivated by promising results, we finally ask our models to provide detailed descriptions of security vulnerabilities. Results show that 36.7% of LLM-generated descriptions can be associated with true CWE vulnerabilities.
尽管代码审查是软件开发过程中至关重要的一环,但进行代码审查可能变得繁琐且昂贵。在本文中,我们研究了大型语言模型(LLMs)是否有助于代码审查。我们的研究重点关注我们认为对良好审查至关重要的两个任务:(i) flagging code with security vulnerabilities(ii) performing software functionality validation,即确保代码实现其预期功能。为了测试这两个任务的表现,我们使用零击和连续思考提示来获得最终的“通过或拒绝”建议。作为数据,我们使用了关键代码生成数据集(HumanEval和MBPP)以及来自共同弱点枚举(CWE)的专家编写的有安全漏洞的代码片段。我们的实验考虑了来自OpenAI的三个专用模型的较小开源LLM。我们发现,前者的性能远远超过了后者。受到鼓舞的结果,最后要求我们的模型提供有关安全漏洞的详细描述。结果表明,36.7%的LLM生成的描述可以与真实CWE漏洞相关联。
https://arxiv.org/abs/2403.08429
Addressing the so-called ``Red-AI'' trend of rising energy consumption by large-scale neural networks, this study investigates the actual energy consumption, as measured by node-level watt-meters, of training various fully connected neural network architectures. We introduce the BUTTER-E dataset, an augmentation to the BUTTER Empirical Deep Learning dataset, containing energy consumption and performance data from 63,527 individual experimental runs spanning 30,582 distinct configurations: 13 datasets, 20 sizes (number of trainable parameters), 8 network ``shapes'', and 14 depths on both CPU and GPU hardware collected using node-level watt-meters. This dataset reveals the complex relationship between dataset size, network structure, and energy use, and highlights the impact of cache effects. We propose a straightforward and effective energy model that accounts for network size, computing, and memory hierarchy. Our analysis also uncovers a surprising, hardware-mediated non-linear relationship between energy efficiency and network design, challenging the assumption that reducing the number of parameters or FLOPs is the best way to achieve greater energy efficiency. Highlighting the need for cache-considerate algorithm development, we suggest a combined approach to energy efficient network, algorithm, and hardware design. This work contributes to the fields of sustainable computing and Green AI, offering practical guidance for creating more energy-efficient neural networks and promoting sustainable AI.
本文研究了通过节点级瓦特计数测量训练各种全连接神经网络架构的实际能源消耗。我们引入了BUTTER-E数据集,这是BUTTER Empirical Deep Learning数据集中的增强版本,包括来自63,527个独立实验运行的能量消耗和性能数据,涵盖了30,582个不同的配置:13个数据集,20个大小(训练参数的数量),8个网络形状(CPU和GPU硬件收集)和14个深度。这个数据集揭示了数据集大小、网络结构和能源消耗之间的复杂关系,并突出了缓存效应的影响。我们提出了一个简单而有效的能源模型,考虑网络大小、计算和内存层次结构。我们的分析还揭示了能量效率与网络设计之间的非线性关系,这挑战了减少参数数量或FLOPs是实现更大能源效率的假设。强调需要考虑缓存影响的算法开发,我们提出了一种结合能量效率的网络、算法和硬件设计的方法。这项工作为可持续计算和绿色人工智能领域提供了实际指导,为创建更节能的神经网络和促进可持续人工智能提供了实践建议。
https://arxiv.org/abs/2403.08151
Natural Language Processing (NLP) is an important branch of artificial intelligence that studies how to enable computers to understand, process, and generate human language. Text classification is a fundamental task in NLP, which aims to classify text into different predefined categories. Text classification is the most basic and classic task in natural language processing, and most of the tasks in natural language processing can be regarded as classification tasks. In recent years, deep learning has achieved great success in many research fields, and today, it has also become a standard technology in the field of NLP, which is widely integrated into text classification tasks. Unlike numbers and images, text processing emphasizes fine-grained processing ability. Traditional text classification methods generally require preprocessing the input model's text data. Additionally, they also need to obtain good sample features through manual annotation and then use classical machine learning algorithms for classification. Therefore, this paper analyzes the application status of deep learning in the three core tasks of NLP (including text representation, word order modeling, and knowledge representation). This content explores the improvement and synergy achieved through natural language processing in the context of text classification, while also taking into account the challenges posed by adversarial techniques in text generation, text classification, and semantic parsing. An empirical study on text classification tasks demonstrates the effectiveness of interactive integration training, particularly in conjunction with TextCNN, highlighting the significance of these advancements in text classification augmentation and enhancement.
自然语言处理(NLP)是人工智能的一个重要分支,研究如何使计算机理解和处理人类语言。文本分类是NLP的一个基本任务,旨在将文本分类为预定义的类别。文本分类是自然语言处理中最基本和最经典的任务,而且自然语言处理中的大多数任务都可以看作是分类任务。近年来,深度学习在许多研究领域取得了巨大的成功,如今,它也已经成为NLP领域中的标准技术,并广泛应用于文本分类任务中。与数字和图像不同,文本处理强调了对输入模型的文本数据的细粒度处理能力。传统的文本分类方法通常需要对输入模型的文本数据进行预处理。此外,它们还需要通过手动注释获得良好的样本特征,然后使用经典的机器学习算法进行分类。因此,本文分析了在NLP的三个核心任务(包括文本表示、词序建模和知识表示)中深度学习的应用现状。本文讨论了通过自然语言处理在文本分类任务中实现改进和协同作用以及面对文本生成、文本分类和语义解析等对抗技术所带来的挑战。一个关于文本分类任务的实证研究证明了交互式整合训练的有效性,特别是在与TextCNN结合时,突出了在文本分类增强和增强方面的这些进步的重要性。
https://arxiv.org/abs/2403.09718
The rapid proliferation of digital content and the ever-growing need for precise object recognition and segmentation have driven the advancement of cutting-edge techniques in the field of object classification and segmentation. This paper introduces "Learn and Search", a novel approach for object lookup that leverages the power of contrastive learning to enhance the efficiency and effectiveness of retrieval systems. In this study, we present an elegant and innovative methodology that integrates deep learning principles and contrastive learning to tackle the challenges of object search. Our extensive experimentation reveals compelling results, with "Learn and Search" achieving superior Similarity Grid Accuracy, showcasing its efficacy in discerning regions of utmost similarity within an image relative to a cropped image. The seamless fusion of deep learning and contrastive learning to address the intricacies of object identification not only promises transformative applications in image recognition, recommendation systems, and content tagging but also revolutionizes content-based search and retrieval. The amalgamation of these techniques, as exemplified by "Learn and Search," represents a significant stride in the ongoing evolution of methodologies in the dynamic realm of object classification and segmentation.
数字内容的快速繁殖和精确物体识别和分割需求的不断增长推动了该领域先进物体分类和分割技术的发展。本文介绍了一种名为“学习与搜索”的新颖物体查找方法,它利用对比学习的力量来提高检索系统的效率和效果。在这项研究中,我们提出了一个优雅而创新的方法,将深度学习原则和对比学习相结合来解决物体搜索中的挑战。我们的广泛实验揭示了令人印象深刻的成果,“学习与搜索”在相对裁剪图像与原始图像之间的最相似区域获得了卓越的相似度网格准确率,展示了其对判断图像中最具相似性的区域的洞察力。将深度学习和对比学习完美结合以解决物体识别的复杂性,不仅为图像识别、推荐系统和内容标签等带来了变革性的应用,而且推动了基于内容的搜索和检索的变革。这些技术,如“学习与搜索”所示,在物体分类和分割方法不断演变的动态领域中代表了显著的进步。
https://arxiv.org/abs/2403.07231
The utilization of semantic information is an important research problem in the field of recommender systems, which aims to complement the missing parts of mainstream ID-based approaches. With the rise of LLM, its ability to act as a knowledge base and its reasoning capability have opened up new possibilities for this research area, making LLM-based recommendation an emerging research direction. However, directly using LLM to process semantic information for recommendation scenarios is unreliable and sub-optimal due to several problems such as hallucination. A promising way to cope with this is to use external knowledge to aid LLM in generating truthful and usable text. Inspired by the above motivation, we propose a Knowledge-Enhanced LLMRec method. In addition to using external knowledge in prompts, the proposed method also includes a knowledge-based contrastive learning scheme for training. Experiments on public datasets and in-enterprise datasets validate the effectiveness of the proposed method.
在推荐系统领域,语义信息的使用是一个重要的研究问题,旨在补充主流基于ID的方法所缺少的部分。随着LLM的出现,它作为知识库和推理能力为该研究领域带来了新的可能性,使基于LLM的推荐成为了一个新兴的研究方向。然而,直接使用LLM处理语义信息以进行推荐场景处理是不可靠的,也是低效的,因为存在诸如幻觉等问题。解决这个问题的一个有前途的方法是使用外部知识帮助LLM生成真实且可用的文本。受到上述动机的启发,我们提出了一个知识增强的LLMRec方法。除了在提示中使用外部知识外,所提出的方法还包括基于知识的对比学习方案用于训练。在公开数据集和企业数据集上的实验证实了所提出方法的有效性。
https://arxiv.org/abs/2403.06642
This paper introduces RecAI, a practical toolkit designed to augment or even revolutionize recommender systems with the advanced capabilities of Large Language Models (LLMs). RecAI provides a suite of tools, including Recommender AI Agent, Recommendation-oriented Language Models, Knowledge Plugin, RecExplainer, and Evaluator, to facilitate the integration of LLMs into recommender systems from multifaceted perspectives. The new generation of recommender systems, empowered by LLMs, are expected to be more versatile, explainable, conversational, and controllable, paving the way for more intelligent and user-centric recommendation experiences. We hope the open-source of RecAI can help accelerate evolution of new advanced recommender systems. The source code of RecAI is available at \url{this https URL}.
本文介绍了RecAI,一种旨在通过大型语言模型的先进功能来增强或甚至颠覆推荐系统的实用工具包。RecAI提供了一组工具,包括Recommender AI Agent、推荐方向的语言模型、知识插件、RecExplainer和Evaluator,以促进从多个角度将大型语言模型集成到推荐系统中。基于LLM的新一代推荐系统预计将更加多才多艺、可解释、具有对话性和可控制,为更加智能和以用户为中心的推荐体验铺平道路。我们希望RecAI的开放式源代码能够加速新先进推荐系统的演变。RecAI的源代码可在此处访问:\url{this <https:// URL>}.
https://arxiv.org/abs/2403.06465
The long-tail recommendation is a challenging task for traditional recommender systems, due to data sparsity and data imbalance issues. The recent development of large language models (LLMs) has shown their abilities in complex reasoning, which can help to deduce users' preferences based on very few previous interactions. However, since most LLM-based systems rely on items' semantic meaning as the sole evidence for reasoning, the collaborative information of user-item interactions is neglected, which can cause the LLM's reasoning to be misaligned with task-specific collaborative information of the dataset. To further align LLMs' reasoning to task-specific user-item interaction knowledge, we introduce collaborative retrieval-augmented LLMs, CoRAL, which directly incorporate collaborative evidence into the prompts. Based on the retrieved user-item interactions, the LLM can analyze shared and distinct preferences among users, and summarize the patterns indicating which types of users would be attracted by certain items. The retrieved collaborative evidence prompts the LLM to align its reasoning with the user-item interaction patterns in the dataset. However, since the capacity of the input prompt is limited, finding the minimally-sufficient collaborative information for recommendation tasks can be challenging. We propose to find the optimal interaction set through a sequential decision-making process and develop a retrieval policy learned through a reinforcement learning (RL) framework, CoRAL. Our experimental results show that CoRAL can significantly improve LLMs' reasoning abilities on specific recommendation tasks. Our analysis also reveals that CoRAL can more efficiently explore collaborative information through reinforcement learning.
短尾推荐对于传统推荐系统具有挑战性,因为数据稀疏性和数据不平衡问题。大型语言模型的(LLMs)最近的发展在复杂推理方面显示出他们的能力,这可以帮助根据非常少的 previous 交互来推断用户的偏好。然而,因为大多数 LLM-based 系统依赖于项目的语义意义作为推理的唯一依据,因此忽略了用户-项目交互的协同信息,这可能导致 LLM 的推理与数据集的特定协同信息不协调。为了进一步将 LLMs 的推理与任务特定的用户-项目交互知识对齐,我们引入了协同检索增强的 LLMs CoRAL,它直接将协同证据融入提示中。基于检索到的用户-项目交互,LLM 可以分析用户之间的共享和独特偏好,并总结表明某些类型的用户会被某些项目吸引的模式。检索到的协同证据提示 LLM 将推理与数据集中的用户-项目交互模式对齐。然而,由于输入提示的容量有限,找到推荐任务的最低协同信息可能具有挑战性。我们通过序列决策过程寻找最优的交互集,并开发了一个通过强化学习(RL)框架学习的检索策略 CoRAL。我们的实验结果表明,CoRAL 可以在特定推荐任务上显著提高 LLMs 的推理能力。我们的分析还发现 CoRAL 通过强化学习更有效地探索了协同信息。
https://arxiv.org/abs/2403.06447