Comparative Analysis of Linear Regression Algorithms and Random Forest for Skincare Product Sales Prediction of Skincare Products on the Tokopedia Marketplace
##plugins.themes.academic_pro.article.main##
Abstract
This study aims to evaluate and compare the performance of Linear Regression and Random Forest algorithms in predicting skincare product sales on the Tokopedia platform. Accurate sales predictions play a strategic role for businesses, particularly in pricing policy formulation, product positioning, and more efficient inventory management. The data used included 7,217 skincare products with six main variables, which were then expanded into 21 features through preprocessing and feature engineering stages, including product type extraction, main benefits, seller characteristics, location tier, and price position category. The model was developed and tested using a training and testing data split with a ratio of 80:20. The analysis results showed that the Random Forest algorithm performed better than Linear Regression. In the test data, Random Forest achieved an R² value of 0.2115 with an RMSE of 32,754.23 and an MAE of 12,073.59. In contrast, Linear Regression only obtained an R² of 0.0245 with an RMSE of 36,431.74 and an MAE of 14,688.99. Although the R² value is relatively low, the results are still realistic considering the complexity of e-commerce sales behavior, which is influenced by various external factors. Feature importance analysis shows that price and price per unit are the most dominant factors, followed by rating, number of product benefits, brand category, and product type. Based on these findings, Random Forest is recommended as a more suitable model to support skincare product sales predictions on Tokopedia, mainly due to its ability to capture non-linear patterns and interactions between features. This study is expected to contribute to the development of data-driven marketing strategies, particularly in pricing, multi-benefit product development, and skincare portfolio management on marketplace platforms.
##plugins.themes.academic_pro.article.details##

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
- Ababil, O. J., Wibowo, S. A., & Zahro, H. Z. (2022). Application of linear regression method in predicting liquid vape sales at the Pandaan Vapor Store based on a website.
- Adi, R. M. S., & Sudianto, S. (2022). Prediction of food commodity prices using the long short-term memory (LSTM) algorithm. Building of Informatics, Technology and Science (BITS), 4(2). https://doi.org/10.47065/bits.v4i2.2229
- Ashari, M. L., & Sadiki, M. (n.d.). Prediction of time series sales transaction data using LSTM regression.
- Dewi, S. P., Nurwati, N., & Rahayu, E. (2022). Application of data mining for predicting sales of best-selling products using the k-nearest neighbor method. Building of Informatics, Technology and Science (BITS), 3(4), 639–648. https://doi.org/10.47065/bits.v3i4.1408
- Fitri, E. (2023). Comparative analysis of linear regression, random forest regression, and gradient boosted trees regression methods for house price prediction. Journal of Applied Computer Science and Technology (JACOST), 4(1), 2723–1453. https://doi.org/10.52158/jacost.491
- Hamdanah, F. H., & Fitrianah, D. (2021). Analysis of the performance of linear regression algorithms with generalized linear models for sales prediction in micro, small, and medium enterprises. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 10(1), 23. https://doi.org/10.23887/janapati.v10i1.31035
- Hidayat, R., et al. (2025). Implementation of the random forest regression algorithm to predict production sales in supermarkets. SIMKOM, 10(1), 101–109. https://doi.org/10.51717/simkom.v10i1.703
- Krisnawati, W. (2022). The effect of the special event “one memorable day with Oh Sehun” on the brand image of Whitelab skincare (Survey of visitors to the special event “one memorable day with Oh Sehun”) (Thesis, Faculty of Communication Sciences, Muhammadiyah University Jakarta).
- Kusumawardani, N., Afandi, M. R., & Riani, L. P. (2023). Demand forecasting analysis using the linear exponential smoothing method (Study on Fendy Batik products, Klaten).
- Nugraheny, D., Indrianingsih, Y., Kurniawan, S., & Sunaryo, H. (2023). Prediction of minimum revenue targets for Dasimoen florist flower shop using web-based multiple linear regression methods. Angkasa: Jurnal Ilmiah Bidang Teknologi, 15(1), 76. https://doi.org/10.28989/angkasa.v15i1.1592
- Romadhon, D. P., & Putra, R. E. (2024). Application of deep learning methods using CNN-based recommendation algorithms in GOLS e-commerce applications (Case study: PT. Cipta Giri Sentosa). Journal of Informatics and Computer Science, 5.
- Rusdy, A. A. (2022). Application of linear regression methods in predicting drug supply and demand: Case study of point of sales application. Bulletin of Islamic Information and Technology Systems, 3(2), 121–126.
- Sianturi, C. J. M., Sinaga, M. D., Sembiring, N. S. B., Ginting, E., & Potensi Utama, U. K. (2024). Multiple linear regression method in web-based product sales prediction. Journal of Informatics, Management and Computers, 16(1).
- Tombeng, M. T., & Ardian, Z. (2021). Supermarket sales prediction using a deep learning approach. Cogito Smart Journal, 7(1).