User-Centric Phishing URL Detection Tool Powered by Interpretable Machine Learning Model.
User-Centric Phishing URL Detection Tool Powered by Interpretable Machine Learning Model.
LinkScope is a senior project focused on delivering an effective phishing detection solution
through two main components: Model Development and Web Application. As a data scientist, I was
responsible for developing the machine learning model and handling backend development. The
Model Development component involves training a machine learning model to distinguish between
legitimate and phishing URLs, while the Web Application provides a user-friendly interface for
accessing these detection functionalities.
I successfully passed ISCIT2024, and my paper, "User-Centric Phishing URL Detection Tool Powered
by Interpretable Machine Learning Model," will be published in IEEE Xplore.
However, in the context of a web application, processing time is also a critical factor. When users submit URLs for analysis, the feature extraction process takes time, adding to the overall latency. This consideration makes it important to balance accuracy with processing time.
LightGBM with all features and with RFECV provides approximately the same highest accuracies at 94.61%. However, having all features extracted is a drawback for the web application, where user experience depends on quick responses. Given this context, LightGBM with RFECV is chosen, reducing the features to 26.
Algorithm | Accuracy | Precision | Recall | F1-score |
---|---|---|---|---|
LightGBM | 94.61% | 95.68% | 93.44% | 94.61% |
LightGBM (Hyperparameterized) | 95.07% | 96.00% | 94.05% | 95.07% |
According to above table, the classification results of LightGBM with RFECV are presented. The hyperparameter tuning process increased the accuracy to 95.07%. The features were selected using Recursive Feature Elimination with Cross Validation (RFECV), resulting in a reduced set of 26 features. These features include 'domainlength', 'www', 'subdomain', 'https', 'short_url', '@', '-', '=', '.', '_', '/', 'digit', 'log', 'pay', 'web', 'account', 'pcemptylinks', 'pcextlinks', 'pcrequrl', 'zerolink', 'extfavicon', 'submit2email', 'sfh', 'redirection', 'domainage', and 'domainend'. This set of features, along with the hyperparameterized LightGBM model, is chosen for deployment in the application.