Pig aggression classification using CNN, Transformers and Recurrent Networks

Abstract
Abstract (translated)
URL
PDF

Abstract

The development of techniques that can be used to analyze and detect animal behavior is a crucial activity for the livestock sector, as it is possible to monitor the stress and animal welfare and contributes to decision making in the farm. Thus, the development of applications can assist breeders in making decisions to improve production performance and reduce costs, once the animal behavior is analyzed by humans and this can lead to susceptible errors and time consumption. Aggressiveness in pigs is an example of behavior that is studied to reduce its impact through animal classification and identification. However, this process is laborious and susceptible to errors, which can be reduced through automation by visually classifying videos captured in controlled environment. The captured videos can be used for training and, as a result, for classification through computer vision and artificial intelligence, employing neural network techniques. The main techniques utilized in this study are variants of transformers: STAM, TimeSformer, and ViViT, as well as techniques using convolutions, such as ResNet3D2, Resnet(2+1)D, and CnnLstm. These techniques were employed for pig video classification with the objective of identifying aggressive and non-aggressive behaviors. In this work, various techniques were compared to analyze the contribution of using transformers, in addition to the effectiveness of the convolution technique in video classification. The performance was evaluated using accuracy, precision, and recall. The TimerSformer technique showed the best results in video classification, with median accuracy of 0.729.

Abstract (translated)

用于分析和检测动物行为的技术的开发对于畜牧业至关重要，因为这可以帮助监测应激和动物福利，并在农场决策中做出贡献。因此，应用程序的开发有助于育种者根据人类分析动物行为后做出改进生产表现和降低成本的决定。通过视觉分类控制环境捕获的视频可以减少其影响。然而，这个过程费力且容易出错，可以通过自动化视觉分类视频来减少错误。捕获的视频可以用于培训，进而通过计算机视觉和人工智能进行分类，采用神经网络技术。在本研究中使用的技术包括变体的Transformer：STAM，TimeSformer和ViViT，以及使用卷积的技术，如ResNet3D2，Resnet（2+1）D和CnnLstm。这些技术用于将视频分类为攻击性和非攻击性行为。在本研究中，还比较了使用Transformer和卷积技术在视频分类中的效果。STAM技术在视频分类中表现最佳，其均方误差（MSE）为0.729。

URL

https://arxiv.org/abs/2403.08528

PDF

https://arxiv.org/pdf/2403.08528.pdf

Pig aggression classification using CNN, Transformers and Recurrent Networks

Abstract

Abstract (translated)

URL

PDF Copy

PDF