Deep Learning In Image Processing For Object Recognition Using Various Techniques
Abstract
Due of the field's tight ties to both picture interpretation and video analysis, object detection has seen a sharp rise in study interest in recent years. Shallow trainable structures and handcrafted characteristics are the foundation of traditional object recognition methods. Their efficacy quickly bottoms out because they build intricate ensembles that integrate higher level input from the object identification systems and the scene classifiers with low-level image data. Deep learning is developing at a rapid pace, giving researchers more effective methods to tackle issues with conventional architectures. These instruments are able to pick up more complex, semantic data. The network design, training procedure, optimization function, and other components of these models vary. An overview of object identification techniques based on deep learning is provided in this study. First, let's review deep learning and the convolutional neural network (CNN), which is its primary tool.
Now next discussion is for common generic object detection architectures and provide some useful tips and adjustments to improve the detection performance for further tasks. Additionally, as many particular detection tasks have distinct features, such as salient object recognition, we briefly examine a few specific tasks, such as face and pedestrian identification. Additionally, experiments are provided in order to assess various strategies and derive some significant conclusions. A range of numerical values reflect all that is visible to a computer. Thus, in order to examine the data contained in images, they need image processing algorithms. In terms of efficacy and speed, You But Look Once (YOLO), More Quickly Region-based neural networks based on convolution (Faster R-CNNs), and Single Shot Detection (SSD) are the most popular computational processing of pictures algorithms. This essay examines these techniques. This evaluation looks at the performance of these three algorithms and looks at their individual benefits and drawbacks using metrics like F1 score, accuracy, and precision. The technology being used for data collection is Microsoft COCO (Common Object in Context). The experiment's findings show that each algorithm's predominant use cases over the other two determine its relative superiority. YOLO-v3, the most optimal algorithm out of the three, outperforms the SSD drive and Faster R-CNN networks under the same testing conditions. In conclusion, a number of fascinating opportunities and challenges are outlined as a basis for further research in the fields of relevant neural network-based learning systems and object identification.
References
2. Chen, Z., & Gupta, S. (2020). Deep learning for object detection: A comprehensive review. Journal of Visual Communication and Image Representation.
3. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition.
4. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition.
5. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems.
6. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition.
7. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE international conference on computer vision.
8. Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement.
9. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition.
10. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition.
11. Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition.
12. Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. Proceedings of the IEEE conference on computer vision and pattern recognition.
13. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016).
14. Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement.
15. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE transactions on pattern analysis and machine intelligence.
16. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition, 2014.
17. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
18. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
19. Girshick R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision. 2015.
20. Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y. Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems. 2015.
21. Everingham M et al. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision. 2010.
22. Peixoto HM, Teles RS, Luiz JVA, Henriques-Alves AM, Santa Cruz RM. Mice Tracking Using the YOLO Algorithm. Vol. 7. PeerJ Preprints; 2019.
23. Henriques-Alves AM, Queiroz CM. Ethological evaluation of the effects of social defeat stress in mice: Beyond the social interaction ratio. Frontiers in Behavioral Neuroscience. 2016.
24. Jhuang H et al. Automated home-cage behavioral phenotyping of mice. Nature Communications. 2010.
25. Burgos-Artizzu XP, Dollár P, Lin D, Anderson DJ, Perona P. Social behavior recognition in continuous video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2012.
26. Norouzzadeh MS et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences of the United States of America. 2018.
27. Guo J, He H, He T, Lausen L, Li M, Lin H, et al. GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing. arXiv preprint arXiv:1907; 2019.
28. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE; 2009.
29. Chen X-L et al. Remote sensing image-based analysis of the relationship between urban heat island and land use/cover changes. Remote Sensing of Environment. 2006.
30. Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. 2015.
31. Chinthamu, N., Gooda, S. K., Venkatachalam, C., Swaminathan, S., & Malathy, G. IoT- based secure data transmission prediction using deep learning model in cloud computing. International Journal on Recent and Innovation Trends in Computing and Communication. 2023.