"Machine Learning Approaches for Detecting Fraudulent Claims in Veterinary Healthcare"

Dr. Kondragunta Rama Krishnaiah; Dr. Harish H

doi:10.69980/redvet.v25i1S.1880

Dr. Kondragunta Rama Krishnaiah
Dr. Harish H

DOI: https://doi.org/10.69980/redvet.v25i1S.1880

Keywords: Veterinary Healthcare, Fraud Detection, Machine Learning, C4.5 Decision Tree, Supervised Learning, AUC.

Abstract

This study investigates the application of machine learning methods to identify fraudulent claims in animal healthcare. The researchers utilized publicly available veterinary claims data and regulatory exclusion databases to label fraudulent claims. Three supervised machine learning models were employed: C4.5 Decision Tree, Logistic Regression, and Support Vector Machine. Their performance was evaluated using metrics such as Area Under the ROC Curve, False Positive Rate, False Negative Rate, and Precision-Recall.

The findings demonstrate that the C4.5 Decision Tree outperformed the other two models in terms of AUC, recall, and FNR, making it the most effective approach for detecting fraudulent claims in veterinary healthcare. The C4.5 model achieved an AUC of 0.883 at an 80:20 class distribution and exhibited the lowest FNR, successfully identifying fraudulent claims without missing significant instances of fraud. Although Logistic Regression showed high precision, it had a higher FNR, indicating a trade-off between precision and recall. SVM exhibited lower overall performance compared to the other models, particularly in AUC and FNR.

The results highlight the potential of machine learning to enhance fraud detection systems in animal healthcare, providing a robust approach to identifying fraudulent claims that may otherwise be overlooked. Future research could focus on exploring additional data sources, feature engineering, and alternative sampling techniques like Synthetic Minority Over-sampling Technique to further improve the detection process. This study contributes to the growing body of work aimed at leveraging machine learning to detect fraud and ensure the proper allocation of resources in animal healthcare.

Author Biographies

Dr. Kondragunta Rama Krishnaiah

R K College of Engineering, Vijayawada 521456, Andhra Pradesh, India

Dr. Harish H

R K College of Engineering, Vijayawada 521456, Andhra Pradesh, India.

References

1 Backman, M. (2017). 10 jaw-dropping stats about Medicare.
2 Bauder, R. A., & Khoshgoftaar, T. M. (2016). A probabilistic programming approach for outlier detection in healthcare claims. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 347–354).
3 Bauder, R. A., & Khoshgoftaar, T. M. (2017). Multivariate outlier detection in Medicare claims payments applying probabilistic programming methods. Health Services and Outcomes Research Methodology, 1(1–34).
4 Bauder, R. A., Khoshgoftaar, T. M., Richter, A., & Herland, M. (2016). Predicting medical provider specialties to detect anomalous insurance claims. In Tools with Artificial Intelligence (ICTAI), 2016 IEEE 28th International Conference on (pp. 784–790). IEEE.
5 Bauder, R. A., Khoshgoftaar, T. M., & Seliya, N. (2017). A survey on the state of healthcare upcoding fraud analysis and detection. Health Services and Outcomes Research Methodology, 17(1), 31–55.
6 Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications, 3(10).
7 Branting, L. K., Reeder, F., Gold, J., & Champney, T. (2016). Graph analytics for healthcare fraud risk estimation. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on (pp. 845–851). IEEE.
8 Chandola, V., Sukumar, S. R., & Schryver, J. C. (2013). Knowledge discovery from massive healthcare claims data. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1312–1320). ACM.
9 CMS Office of Enterprise Data and Analytics. (2017). Medicare Fee-For-Service Provider Utilization & Payment Data Physician and Other Supplier. CMS. Centers for Medicare and Medicaid Services: Research, Statistics, Data, and Systems.
10 DeHaven, W. R. (2014). Are we really doing enough to provide the best veterinary care for our pets? Journal of the American Veterinary Medical Association, 244(9), 1017. https://doi.org/10.2460/javma.244.9.1017
11 Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239.
12 Herland, M., Bauder, R. A., & Khoshgoftaar, T. M. (2017). Medical provider specialty predictions for the detection of anomalous Medicare insurance claims. In Information Reuse and Integration (IRI), 2017 IEEE 18th International Conference (pp. 579–588). IEEE.
13 Jeni, L. A., Cohn, J. F., & De La Torre, F. (2013). Facing imbalanced data—recommendations for the use of performance metrics. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on (pp. 245–251). IEEE.
14 Joudaki, H., Rashidian, A., Minaei-Bidgoli, B., Mahmoodi, M., Geraili, B., Nasiri, M., & Arab, M. (2015). Using data mining to detect healthcare fraud and abuse: A review of literature. Global Journal of Health Science, 7(1), 194.
15 Kondragunta, R. K., & Alahari, H. P. (2021). Detecting healthcare fraud using machine learning: Excluding provider labels for improved accuracy. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(8), 3285–3294. https://doi.org/10.17762/turcomat.v12i8.13999
16 Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Applied Statistics, 191–201.
17 Lu, J., Lin, K.-B., Chen, R., Lin, M., Chen, X., & Lu, P. (2023). Health insurance fraud detection by using an attributed heterogeneous information network with a hierarchical attention mechanism. BMC Medical Informatics and Decision Making, 23(1). https://doi.org/10.1186/s12911-023-02152-0
18 Nabrawi, E., & Alanazi, A. (2023). Fraud detection in healthcare insurance claims using machine learning. Risks, 11(9), 160. https://doi.org/10.3390/risks11090160
19 Offen, M. L. (1999). HEALTH CARE FRAUD [Review of HEALTH CARE FRAUD]. Neurologic Clinics, 17(2), 321. https://doi.org/10.1016/s0733-8619(05)70135-3
20 Pande, V., & Maas, W. (2013). Physician Medicare fraud: Characteristics and consequences. International Journal of Pharmaceutical and Healthcare Marketing, 7(1), 8–33.
21 Sadiq, S., Tao, Y., Yan, Y., & Shyu, M.-L. (2017). Mining anomalies in Medicare big data using patient rule induction method. In Multimedia Big Data (BigMM), 2017 IEEE Third International Conference on (pp. 185–192). IEEE.
22 Sargin, A., et al. (2009). Statistics and data with R: An applied approach through examples. Journal of Statistical Software, 30(b06).
23 Van Hulse, J., Khoshgoftaar, T. M., & Napolitano, A. (2007). Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th International Conference on Machine Learning (pp. 935–942). ACM.
24 Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354.
25 Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.
26 Wünderlich, N. V., Mosteller, J., Beverland, M., Downey, H., Kraus, K., Lin, M., & Syrjälä, H. (2021). Animals in our Lives: An Interactive Well-Being Perspective. Journal of Macromarketing, 41(4), 646. https://doi.org/10.1177/0276146720984815
27 Zhang, C., Xiao, X., & Wu, C. (2020). Medical fraud and abuse detection system based on machine learning. International Journal of Environmental Research and Public Health, 17(19), 7265. https://doi.org/10.3390/ijerph17197265