OPTIMIZATION OF LUNG CANCER CLASSIFICATION METHOD USING EDA-BASED MACHINE LEARNING
DOI:
https://doi.org/10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3413Abstract
Lung cancer is one of the three deadliest diseases in the world and has rapidly developed. Based on this, researchers conducted research to predict the factors that influence lung cancer. One method to identify this is using data mining methods and classification techniques. Researchers used several popular algorithms in classification to make comparisons of the most accurate algorithms for lung cancer classification. The algorithms used include K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. The researcher used this algorithm because, in the research that the researcher found on the Kaggle platform, he examined the comparison of the algorithm using the breast cancer dataset. In previous studies, their researchers used SVM, which obtained an accuracy of 96.47%, Neural Networks of 97.06%, and Naïve Bayes with an accuracy of 91.18% to study breast cancer. The difference from previous research is that this study uses several existing algorithms in Machine Learning such as K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. In addition, this research was conducted to see whether the results of the accuracy of the algorithm that the researchers carried out using the lung cancer dataset had different results.
The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Windania Purba, Sumita Wardani, Diana Febrina Lumbantoruan, Fransiska Celia Ivoi Silalahi, Thomas Leo Edison
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish their manuscripts through the Journal of Information Systems and Computer Science agree to the following:
- Copyright to the manuscripts of scientific papers in this Journal is held by the author.
- The author surrenders the rights when first publishing the manuscript of his scientific work and simultaneously the author grants permission / license by referring to the Creative Commons Attribution-ShareAlike 4.0 International License to other parties to distribute his scientific work while still giving credit to the author and the Journal of Information Systems and Computer Science as the first publication medium for the work.
- Matters relating to the non-exclusivity of the distribution of the Journal that publishes the author's scientific work can be agreed separately (for example: requests to place the work in the library of an institution or publish it as a book) with the author as one of the parties to the agreement and with credit to sJournal of Information Systems and Computer Science as the first publication medium for the work in question.
- Authors can and are expected to publish their work online (e.g. in a Repository or on their Organization's/Institution's website) before and during the manuscript submission process, as such efforts can increase citation exchange earlier and with a wider scope.