Securing healthcare systems: a random forest approach to malicious URL detection

Nair, A.; Patel, P.; Vadher, H.; Patel, M.; Vyas, T.; Bhatt, C.; Conte, L.; De Nunzio, G.

doi:10.1007/s11416-025-00579-9

Purpose: With technological advancements, uniform resource locators (URLs) are increasingly used in healthcare to store patient records, reducing paperwork. However, security concerns arise as malicious URLs can deceive users, leading to data breaches. Machine learning (ML) offers a solution by analyzing past data to predict whether a URL is malicious or benign. Methods: In this work, a dataset from GitHub containing 151,828 URL samples was pre-processed, revealing unique characteristics of malicious URLs. Ad hoc feature extraction techniques were applied to capture these distinguishing traits. To classify URLs, various supervised ML classifiers were used, including logistic regression (LR), perceptron, decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), gradient boost (GB), k-nearest neighbors (KNN), support vector machine (SVM), cat boost (CB), multinomial naive bayes (MNB), bernoulli baive bayes (BNB), light gradient boosting (LGBM) and passive aggressive classifier (PAC). Additionally, “automatic” feature extraction was performed using the term frequency-inverted document frequency (TF-IDF) method and the extracted features were then used with models such as LR, DT, RF, XGBoost, CB, KNN, LGBM, PAC, MNB, and BNB. Results: Experimental results demonstrate that automatic feature extraction improves classification accuracy, making it a reliable method for detecting malicious URLs. The RF classifier had the best performance with both methods, achieving 99.82% accuracy with automatic feature extraction compared to 99.57% with hand-crafted features. The other metrics also improved with automatic feature extraction, including 99.84% precision, 99.44% recall, and 99.64% F1 score. Conclusion: This approach has potential applications in securing healthcare systems, web browsers, and cybersecurity platforms, helping prevent unauthorized access to sensitive information.

Securing healthcare systems: a random forest approach to malicious URL detection

Nair A.;Patel P.;Vadher H.;Patel M.;Vyas T.;Bhatt C.;Conte L.;De Nunzio G.^Ultimo

Ultimo

2025-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

Securing healthcare systems: a random forest approach to malicious URL detection

Nair A.;Patel P.;Vadher H.;Patel M.;Vyas T.;Bhatt C.;Conte L.;De Nunzio G. Ultimo

Ultimo

2025-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Nair A.;Patel P.;Vadher H.;Patel M.;Vyas T.;Bhatt C.;Conte L.;De Nunzio G.^Ultimo

Scheda breve

Scheda completa

Scheda completa (DC)