TITLE:
A Scalable Multidimensional Data Warehouse with Machine Learning for Real-Time Diabetes Management in Bangladesh
AUTHORS:
Md. Al Mamun, Omar Faruq Tanim, Dulal Chakraborty, Muhammad Saidur Rahman, Mohammad Shorif Uddin
KEYWORDS:
Dimension, Fact Table, ETL, GUI, Aggregate, Query, Accuracy
JOURNAL NAME:
Journal of Computer and Communications,
Vol.14 No.6,
June
26,
2026
ABSTRACT: Purpose: Diabetes presents a major public health challenge in Bangladesh, demanding effective data-driven solutions for improved disease monitoring and management. This study aims to design and implement a scalable, multidimensional data warehouse integrated with statistical and machine learning techniques to support batch-based diabetes monitoring, prediction, and decision-making. The key research question is: Can a unified data-driven framework improve diabetes classification accuracy and provide actionable clinical insights for healthcare systems in Bangladesh? Methods: Clinical and demographic data from selected hospitals were consolidated into a centralized data warehouse. A Python-based GUI enabled interactive data access and visualization. Statistical analyses (ANOVA, Chi-square) assessed associations between demographic, clinical, and lifestyle factors. For predictive modeling, supervised learning algorithms—Logistic Regression, Decision Tree, Multilayer Perceptron (MLP), and LightGBM—were trained and evaluated for diabetes type classification. Results: Statistical analysis revealed significant associations between gender, treatment cost, and patient satisfaction; blurred vision and diabetes longevity; and lifestyle habits and weight loss. Among the machine learning models tested, Logistic Regression demonstrated the best overall performance, achieving 81.25% accuracy, 82.07% precision, 81.3% recall, an F1-score of 81.48%, a ROC-AUC of 0.8278, and a log loss of 0.5029. Conclusions: The integrated data warehouse and machine learning framework offers a scalable, batch-based prediction system for diabetes management in Bangladesh. It combines statistical insights with predictive modeling to support clinical decision-making and is adaptable across healthcare settings. This approach meets the urgent need for actionable, data-driven insights into chronic disease care and advances the country’s digital health transformation.