An effective detection approach for phishing websites using URL and HTML features-Reference-Cited by-同舟云学术

An effective detection approach for phishing websites using URL and HTML features

Published:2022-05-25 Issue:1 Volume:12 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Aljofey Ali,Jiang Qingshan,Rasool Abdur,Chen Hui,Liu Wenyin,Qu Qiang,Wang Yang

Abstract

AbstractToday's growing phishing websites pose significant threats due to their extremely undetectable risk. They anticipate internet users to mistake them as genuine ones in order to reveal user information and privacy, such as login ids, pass-words, credit card numbers, etc. without notice. This paper proposes a new approach to solve the anti-phishing problem. The new features of this approach can be represented by URL character sequence without phishing prior knowledge, various hyperlink information, and textual content of the webpage, which are combined and fed to train the XGBoost classifier. One of the major contributions of this paper is the selection of different new features, which are capable enough to detect 0-h attacks, and these features do not depend on any third-party services. In particular, we extract character level Term Frequency-Inverse Document Frequency (TF-IDF) features from noisy parts of HTML and plaintext of the given webpage. Moreover, our proposed hyperlink features determine the relationship between the content and the URL of a webpage. Due to the absence of publicly available large phishing data sets, we needed to create our own data set with 60,252 webpages to validate the proposed solution. This data contains 32,972 benign webpages and 27,280 phishing webpages. For evaluations, the performance of each category of the proposed feature set is evaluated, and various classification algorithms are employed. From the empirical results, it was observed that the proposed individual features are valuable for phishing detection. However, the integration of all the features improves the detection of phishing sites with significant accuracy. The proposed approach achieved an accuracy of 96.76% with only 1.39% false-positive rate on our dataset, and an accuracy of 98.48% with 2.09% false-positive rate on benchmark dataset, which outperforms the existing baseline approaches.

Funder

the National Key Research and Development Program of China

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-022-10841-5.pdf

Reference47 articles.

1. RSA. Rsa fraud report. https://go.rsa.com/l/797543/2020-07-08/3njln/797543/48525/RSA_Fraud_Report_Q1_2020.pdf (2020) (Accessed 14 January 2021).

2. APWG. Phishing Attack Trends Reports, 24, November 2020. https://docs.apwg.org/reports/apwg_trends_report_q3_2020.pdf (2020) (Accessed 14 January 2021).

3. Aljofey, A., Jiang, Q., Qu, Q., Huang, M. & Niyigena, J.-P. An effective phishing detection model based on character level convolutional neural network from URL. Electronics 9, 1514 (2020).

4. Dhamija, R., Tygar, J.D., & Hearst, M. Why phishing works. in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 22–27 April 2006, 581–590 (2006).

5. Jain, A. K. & Gupta, B. B. A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP J. on Info. Security. 9, 1–11. https://doi.org/10.1186/s13635-016-0034-3 (2016).

Cited by 40 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ENHANCING BANK CUSTOMER PROTECTION AGAINST PHISHING ATTACKS THROUGH XGBOOST-BASED FEATURE ANALYSIS;Transmisi: Jurnal Ilmiah Teknik Elektro;2024-07-31

2. Unveiling suspicious phishing attacks: enhancing detection with an optimal feature vectorization algorithm and supervised machine learning;Frontiers in Computer Science;2024-07-02

3. Detection of phishing URLs with deep learning based on GAN-CNN-LSTM network and swarm intelligence algorithms;Signal, Image and Video Processing;2024-06-17

4. Browser‐in‐the‐middle attacks: A comprehensive analysis and countermeasures;SECURITY AND PRIVACY;2024-05-28

5. NAISS: A reverse proxy approach to mitigate MageCart's e-skimmers in e-commerce;Computers & Security;2024-05