Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques

Althaf Ali A; Rama Devi K; Syed Siraj Ahmed  N; Ramchandran  P; Parvathi  S

doi:10.31341/jios.48.2.4

Authors

Althaf Ali A Department of Computer Application, Madanapalle Institute of Technology & Science, Madanapalle, India
Rama Devi K Department of Information Technology, Panimalar Engineering College, Chennai, India
Syed Siraj Ahmed N School of Computer Science Engineering and Information Science, Presidency University, Bangalore, India
Ramchandran P Department of computer Application, Parul institute of engineering and technology, Parul University, P.O.limda, Tal.waghodia, Dist.Vadodra, India-391760
Parvathi S Department of Computer Science and Engineering, Erode Sengunthar Engineering College, Erode, India

DOI:

https://doi.org/10.31341/jios.48.2.4

Keywords:

Count, Term frequency and Inverse document frequency, Machine learning model, Phishing, Malicious webpages

Abstract

The proliferation of malicious webpages presents a growing threat to online security, necessitating advanced detection methods to mitigate risks. This paper proposes a novel approach that integrates Natural Language Processing (NLP) techniques with an ensemble of machine learning models for the proactive detection of malicious web content. By leveraging semantic analysis, lexical patterns, and metadata extraction, the proposed framework enhances the identification of suspicious patterns in web page content. The ensemble model combines decision trees, random forests, and gradient boosting methods, optimizing classification accuracy and reducing false positives. A comprehensive evaluation using a large dataset of web pages, including both benign and malicious examples, demonstrates the superiority of the proposed method over traditional single-model approaches. With accuracy rates exceeding 98%, this framework achieves a robust, scalable solution for real-time web content analysis, providing a critical tool for cybersecurity professionals to detect and block malicious webpages before they can cause harm. Future directions include the integration of deep learning architectures and adaptive filtering techniques to further refine detection capabilities.

Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Make a submission

SJR Score

Ranked

doaj-crossref

Referenced

scopus

esci

plagiarism

Trustworthy