TITLE:
Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features
AUTHORS:
Aseer Al Faisal, Atiqur Rahman
KEYWORDS:
Index Words-Phishing Detection, Deep Reinforcement Learning, RoBERTa Semantic Embeddings, Quantile Regression Deep Q-Network, QR-DQN, URL Classification, Lexical Features, Cybersecurity
JOURNAL NAME:
Journal of Information Security,
Vol.17 No.3,
June
17,
2026
ABSTRACT: Phishing is a form of cybercrime in which people are deceived into exposing their personal information which can result in financial loss. These attacks are often executed via fraudulent messages, misleading advertisements and compromised legitimate websites. This study proposes a framework based on Quantile Regression Deep Q-Network (QR-DQN) that integrates RoBERTa semantic embeddings and crafted lexical features to enhance phishing detection. Instead of predicting mean returns, QR-DQN uses quantile regression to model the distribution over returns which improves stability and generalization for previously unseen phishing samples over traditional RL DQN approaches when combined with semantic embeddings. A custom crawled diverse dataset of 105,000 URLs were curated from PhishTank, OpenPhish, Cloudflare etc. The framework uses an 80/20 split of the dataset. The QR-DQN model with RoBERTa embeddings and lexical features achieved test accuracy 99.86%, precision 99.75%, recall 99.96% and F1-score 99.85% demonstrating high effectiveness. Compared to the standard DQN with lexical features, the suggested QR-DQN framework with lexical and semantic features lowers the generalization gap from 1.66 to 0.04 percent. The experiments using 5-fold cross-validation have resulted in consistent results under this protocol with a mean accuracy of 99.90% and standard deviation of 0.04%. This shows the hybrid technique which combines quantile-based value estimation with RoBERTa semantic embeddings and lexical features reports strong performance and reduced generalization gap.