Development of a Web-Based Diabetes Risk Monitoring System Prototype Using K-Means for Risk Segmentation
##plugins.themes.academic_pro.article.main##
Abstract
Diabetes risk monitoring often focuses on data recording and visualization, with limited analytical support for identifying interpretable risk patterns from health-related data. This study developed a web-based diabetes risk monitoring prototype that integrates K-Means clustering for analytical risk segmentation. Using the Pima Indians Diabetes Database with 768 records, this study applied invalid zero handling, median imputation, Min-Max scaling, K-Means clustering, internal validation, cluster profiling, and artifact preparation. Two preprocessing scenarios and candidate values of k were evaluated. The final model was selected under the Extended Scenario at k = 2 and produced two analytical risk segments: Higher Risk and Lower Risk. Cluster 0 contained 270 records (35.16%) and was labeled Higher Risk, while Cluster 1 contained 498 records (64.84%) and was labeled Lower Risk. The prototype was implemented using Flask, SQLite, HTML/CSS, Chart.js, and a saved model bundle. It supported data input, result display, prediction history, edit and delete functions, CSV export, and dashboard visualization. Black-box testing across 27 functional test cases showed that all tested features operated successfully. The main contribution is the integration of K-Means-based segmentation into an operational web prototype for structured monitoring, not clinical diagnosis.
##plugins.themes.academic_pro.article.details##

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
- Abousaber, I., Abdallah, H. F., & El-Ghaish, H. (2025). Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets. Frontiers in Artificial Intelligence, 7, 1499530. https://doi.org/10.3389/frai.2024.1499530
- Chang, V., Bailey, J., Xu, Q. A., & Sun, Z. (2023). Pima Indians diabetes mellitus classification based on machine learning algorithms. Neural Computing and Applications, 35(22), 16157–16173. https://doi.org/10.1007/s00521-022-07049-z
- Chicco, D., Campagner, A., Spagnolo, A., Ciucci, D., & Jurman, G. (2025). The Silhouette coefficient and the Davies-Bouldin index are more informative than Dunn index, Calinski-Harabasz index, Shannon entropy, and Gap statistic for unsupervised clustering internal evaluation of two convex clusters. PeerJ Computer Science, 11, e3309. https://doi.org/10.7717/peerj-cs.3309
- scikit-learn developers. (2026a). calinski_harabasz_score. Scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html
- scikit-learn developers. (2026b). davies_bouldin_score. Scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html
- scikit-learn developers. (2026c). KMeans. Scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
- scikit-learn developers. (2026d). MinMaxScaler. Scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
- scikit-learn developers. (2026e). Model persistence. Scikit-learn. https://scikit-learn.org/stable/model_persistence.html
- scikit-learn developers. (2026f). silhouette_score. Scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
- scikit-learn developers. (2026g). SimpleImputer. Scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
- International Diabetes Federation. (2025). IDF Diabetes Atlas (11th ed.). https://diabetesatlas.org/resources/idf-diabetes-atlas-2025/
- Feng, X., Cai, Y., & Xin, R. (2023). Optimizing diabetes classification with a machine learning-based framework. BMC Bioinformatics, 24, 428. https://doi.org/10.1186/s12859-023-05467-x
- Ikotun, A. M., Habyarimana, F., & Ezugwu, A. E. (2025). Benchmarking validity indices for evolutionary K-means clustering performance. Scientific Reports, 15, 21842. https://doi.org/10.1038/s41598-025-08473-6
- Badan Kebijakan Pembangunan Kesehatan. (2024). Prevalence, impact, and efforts in controlling hypertension and diabetes in Indonesia [Fact sheet]. https://repository.badankebijakan.kemkes.go.id/id/eprint/5531/1/07%20factsheet%20PTM_English.pdf
- Kerr, D., Ahn, D., Waki, K., Wang, J., Breznen, B., & Klonoff, D. C. (2024). Digital interventions for self-management of type 2 diabetes mellitus: Systematic literature review and meta-analysis. Journal of Medical Internet Research, 26, e55757. https://doi.org/10.2196/55757
- Onthoni, D. D., Chen, Y. E., Lai, Y. H., Li, G. H., Zhuang, Y. S., & Lin, H. M. (2025). Clustering-based risk stratification of prediabetes populations: Insights from the Taiwan and UK Biobanks. Journal of Diabetes Investigation, 16(1), 25–35. https://doi.org/10.1111/jdi.14328
- World Health Organization. (2024). Urgent action needed as global diabetes cases increase four-fold over past decades. https://www.who.int/news/item/13-11-2024-urgent-action-needed-as-global-diabetes-cases-increase-four-fold-over-past-decades
- Plotnikova, V., Dumas, M., & Milani, F. P. (2022). Applying the CRISP-DM data mining process in the financial services industry: Elicitation of adaptation requirements. Data and Knowledge Engineering, 139, 102013. https://doi.org/10.1016/j.datak.2022.102013
- Samadbeik, M., Engstrom, T., Lobo, E. H., Kostner, K., Austin, J. A., Pole, J. D., & Sullivan, C. (2024). Healthcare dashboard technologies and data visualization for lipid management: A scoping review. BMC Medical Informatics and Decision Making, 24(1), 352. https://doi.org/10.1186/s12911-024-02730-w
- Schröer, C., Kruse, F., & Gómez, J. M. (2021). A systematic literature review on applying CRISP-DM process model. Procedia Computer Science, 181, 526–534. https://doi.org/10.1016/j.procs.2021.01.199
- Tan, S. Y., Sumner, J., & Wang, Y. (2024). A systematic review of the impacts of remote patient monitoring interventions. Npj Digital Medicine, 7, 192. https://doi.org/10.1038/s41746-024-01182-w
- Taurbekova, B., Sarsenov, R., Yaqoob, M. M., Atageldiyeva, K., Semenova, Y., Fazli, S., & Sarria-Santamera, A. (2025). Cluster analysis in diabetes research: A systematic review enhanced by a cross-sectional study. Journal of Clinical Medicine, 14(10), 3588. https://doi.org/10.3390/jcm14103588
- Zhou, B., Rayner, A. W., Gregg, E. W., Sheffer, K. E., Carrillo-Larco, R. M., & Bennett, J. E. (2024). Worldwide trends in diabetes prevalence and treatment from 1990 to 2022. The Lancet, 404(10467), 2077–2093. https://doi.org/10.1016/S0140-6736(24)02317-1