Optimasi Metode Supervised Learning Dengan Menggunakan Particle Swarm Optimization Untuk Deteksi Malware
DOI:
https://doi.org/10.34012/jutikomp.v6i2.4281Keywords:
Malware Detection, Particle Swarm Optimization, Supervised LearningAbstract
The purpose of this research is for malware detection to solve problems that arise when users access the internet and download files that have been infiltrated by malware. One of the popular solutions today is to use machine learning techniques to train many malware models by considering special features that allow prediction of whether particular software is malware or harmless using machine learning algorithms. The dataset used is a malware detection dataset from Kaggle, which will then be classified using the ensemble classifier algorithm which belongs to the supervised learning category algorithm. Improve classification with feature optimization using Particle Swarm Optimization (PSO). This study resulted in an accuracy value generated by the Ensemble algorithm of 92%, AUC 0.94%. Then, the classification was optimized with PSO, resulting in an accuracy value increased by 7.32% to 100% accuracy while AUC increased by 0.059 to AUC of 1. From the results of the research produced, feature selection is recommended before building a classification model for malware detection.
References
A. Kamboj, P. Kumar, A. K. Bairwa, and S. Joshi, “Detection of malware in downloaded files using various machine learning models,” Egypt. Informatics J., vol. 24, no. 1, pp. 81–94, 2022, doi: 10.1016/j.eij.2022.12.002.
E. Raff and C. K. Nicholas, “Machine Learning for Malware Detection,” Mach. Learn. Malware Detect., 2024, doi: 10.1142/13017.
S. A. Habtor and A. H. H. Dahah, “Machine-Learning Classifiers for Malware Detection Using Data Features,” J. ICT Res. Appl., vol. 15, no. 3, pp. 265–290, 2021, doi: 10.5614/ITBJ.ICT.RES.APPL.2021.15.3.5.
A. Amer and N. A. Aziz, “Malware Detection through Machine Learning Techniques,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 8, no. 5, pp. 2408–2413, 2019.
T. Arifin and A. Herliana, “Optimasi Metode Klasifikasi dengan Menggunakan Particle Swarm Optimization untuk Identifikasi Penyakit Diabetes Retinopathy,” vol. 4, no. 2, pp. 77–81, 2018.
C. W. Kim, “NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls,” pp. 1–8, 2018, [Online]. Available: http://arxiv.org/abs/1802.05412.
R. B. Hadiprakoso, N. Qomariasih, and R. N. Yasa, “Identifikasi Malware Android Menggunakan Pendekatan Analisis Hibrid Dengan Deep Learning,” J. Teknol. Inf. Univ. Lambung Mangkurat, vol. 6, no. 2, pp. 77–84, 2021, doi: 10.20527/jtiulm.v6i2.82.
R. B. Hadiprakoso, W. R. Aditya, and F. N. Pramitha, “Analisis Statis Deteksi Malware Android Menggunakan Algoritma Supervised Machine Learning,” Cyber Secur. dan Forensik Digit., vol. 5, no. 1, pp. 1–5, 2022, doi: 10.14421/csecurity.2022.5.1.3116.
Z. Salekshahrezaee, J. L. Leevy, and T. M. Khoshgoftaar, “The effect of feature extraction and data sampling on credit card fraud detection,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537-023-00684-w.
B. Liu and G. Tsoumakas, “Dealing with class imbalance in classifier chains via random undersampling,” Knowledge-Based Syst., vol. 192, p. 105292, 2020, doi: 10.1016/j.knosys.2019.105292.
Y. E. Kurniawati and Y. D. Prabowo, “Model optimisation of class imbalanced learning using ensemble classifier on over-sampling data,” IAES Int. J. Artif. Intell., vol. 11, no. 1, pp. 276–283, 2022, doi: 10.11591/ijai.v11.i1.pp276-283.
L. Liu, X. Wu, S. Li, Y. Li, S. Tan, and Y. Bai, “Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, pp. 1–16, 2022, doi: 10.1186/s12911-022-01821-w.
E. Purnamasari, D. Palupi Rini, and Sukemi, “Seleksi Fitur menggunakan Algoritma Particle Swarm Optimization pada Klasifikasi Kelulusan Mahasiswa dengan Metode Naive Bayes,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 3, pp. 469–475, 2020.
S. A. Alsenan, I. M. Al-Turaiki, and A. M. Hafez, “Feature extraction methods in quantitative structure-activity relationship modeling: A comparative study,” IEEE Access, vol. 8, pp. 78737–78752, 2020, doi: 10.1109/ACCESS.2020.2990375.
A. Fauzi and A. H. Yunial, “Optimasi Algoritma Klasifikasi Naive Bayes, Decision Tree, K – Nearest Neighbor, dan Random Forest menggunakan Algoritma Particle Swarm Optimization pada Diabetes Dataset,” J. Edukasi dan Penelit. Inform., vol. 8, no. 3, p. 470, 2022, doi: 10.26418/jp.v8i3.56656.
D. Zheng, C. Qin, and P. Liu, “Adaptive Particle Swarm Optimization Algorithm Ensemble Model Applied to Classification of Unbalanced Data,” Sci. Program., vol. 2021, no. 1, 2021, doi: 10.1155/2021/7589756.
N. Saravana, “Malware Detection,” https://www.kaggle.com/, 2017. https://www.kaggle.com/datasets/nsaravana/malware-detection.
Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS, pp. 503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.
W. Yustanti and N. Rochmawati, “Analisis Algoritma Klasifikasi untuk Memprediksi Karakteristik Mahasiswa pada Pembelajaran Daring,” J. Edukasi dan Penelit. Inform., vol. 8, no. 1, pp. 57–61, 2022.
Yoga Religia, Agung Nugroho, and Wahyu Hadikristanto, “Klasifikasi Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, pp. 187–192, 2021, doi: 10.29207/resti.v5i1.2813.
M. R. Givari, M. R. Sulaeman, and Y. Umaidah, “Perbandingan Algoritma SVM, Random Forest Dan XGBoost Untuk Penentuan Persetujuan Pengajuan Kredit,” Nuansa Inform., vol. 16, no. 1, pp. 141–149, 2022, doi: 10.25134/nuansa.v16i1.5406.
H. H. Sinaga and S. Agustian, “Pebandingan Metode Decision Tree dan XGBoost untuk Klasifikasi Sentimen Vaksin Covid-19 di Twitter,” J. Nas. Teknol. dan Sist. Inf., vol. 8, no. 3, pp. 107–114, 2022, doi: 10.25077/teknosi.v8i3.2022.107-114.
Z. Salam Patrous, “Evaluating XGBoost for User Classification by using Behavioral Features Extracted from Smartphone Sensors,” p. 67, 2018, [Online]. Available: https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1240595&dswid=-6444.
A. N. A. Aldania, A. M. Soleh, and K. A. Notodiputro, “A Comparative Study of CatBoost and Double Random Forest for Multi-class Classification,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 7, no. 1, pp. 129–137, 2023, doi: 10.29207/resti.v7i1.4766.
L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: Unbiased boosting with categorical features,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, no. Section 4, pp. 6638–6648, 2018.
S. Touzani, J. Granderson, and S. Fernandes, “Gradient boosting machine for modeling the energy consumption of commercial buildings,” Energy Build., vol. 158, no. January 2018, pp. 1533–1543, 2018, doi: 10.1016/j.enbuild.2017.11.039.
I. Wardhana, Musi Ariawijaya, Vandri Ahmad Isnaini, and Rahmi Putri Wirman, “Gradient Boosting Machine, Random Forest dan Light GBM untuk Klasifikasi Kacang Kering,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 1, pp. 92–99, 2022, doi: 10.29207/resti.v6i1.3682.
Y. Wanli Sitorus, P. Sukarno, S. Mandala, F. Informatika, and U. Telkom, “Analisis Deteksi Malware Android menggunakan metode Support Vector Machine & Random Forest,” e-Proceeding Eng., vol. 8, no. 6, p. 12500, 2021.
L. Zhang, “A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information,” Hindawi Comput. Intell. Neurosci., vol. 2021, 2021.
Y. Zhang, S. Wang, P. Phillips, and G. Ji, “Binary PSO with mutation operator for feature selection using decision tree applied to spam detection,” Knowledge-Based Syst., vol. 64, pp. 22–31, 2014, doi: 10.1016/j.knosys.2014.03.015.
A. G. Gad, Particle Swarm Optimization Algorithm and Its Applications: A Systematic Review, vol. 29, no. 5. Springer Netherlands, 2022.
R. C. Chen, C. Dewi, S. W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00327-4.
T. R. Shultz and S. E. Fahlman, Encyclopedia of Machine Learning and Data Mining. 2017.
P. Sedgwick, “How to read a receiver operating characteristic curve,” BMJ, vol. 350, no. May, 2015, doi: 10.1136/bmj.h2464.
M. R. S. Alfarizi, M. Z. Al-farish, M. Taufiqurrahman, G. Ardiansah, and M. Elgar, “Penggunaan Python Sebagai Bahasa Pemrograman untuk Machine Learning dan Deep Learning,” Karya Ilm. Mhs. Bertauhid (KARIMAH TAUHID), vol. 2, no. 1, pp. 1–6, 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Mayadi -, Ismaniah -, Tyastuti Sri Lestari, Wowon Priatna
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
- Hak Cipta atas naskah-naskah karya ilmiah di dalam Jurnal ini dipegang oleh Penulis.
- Penulis menyerahkan hak saat pertama kali mempublikasi Naskah karya ilmiahnya dan secara bersamaan Penulis memberikan izin/lisensi dengan mengacu pada Creative Commons Attribution-ShareAlike 4.0 International License kepada pihak lain untuk menyebarkan karya ilmiahnya tersebut dengan tetap mencantumkan penghargaan bagi penulis dan Jurnal Teknologi dan Ilmu Komputer Prima sebagai media Publikasi pertama atas karya tersebut.
- Hal-hal yang berkaitan dengan non-eksklusivitas pendistribusian Jurnal yang menerbitkan karya ilmiah penulis dapat diperjanjikan secara terpisah (contoh: permintaan untuk menempatkan karya yang dimaksud pada perpustakaan suatu institusi atau menerbitkannya sebagai buku) dengan Penulis sebagai salah satu pihak perjanjian dan dengan penghargaan pada Jurnal Teknologi dan Ilmu Komputer Prima sebagai media publikasi pertama atas karya dimaksud.
- Penulis dapat dan diharapkan untuk mengumumkan karyanya secara online (misalnya pada Repositori atau pada laman Organisai/Institusinya) sejak sebelum dan selama proses pengumpulan naskah, sebab upaya tersebut dapat meningkatkan pertukaran citasi lebih awal dan dengan cakupan yang lebih luas.