1. Introduction

JBM

Journal of Biosciences and Medicines

2327-5081

Scientific Research Publishing

10.4236/jbm.2026.145032

JBM-151611

Articles

Biomedical&Life Sciences

Statistical Models and Machine Learning in Predicting Childhood Obesity and Related Metabolic Disorders

Ruiqi

Huo

₁

The Second Hospital & Clinical Medical School, Lanzhou University, Lanzhou, China

06052026

140547749230, April 202626, May 2026 29, May 2026

2014

This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/

Childhood obesity has emerged as a major global public health challenge, closely linked to a range of metabolic disorders and long-term adverse health outcomes. Early risk identification and precise stratification are critical for effective prevention and intervention. In recent years, driven by the rapid expansion of public health data and methodological advances, traditional regression models, longitudinal data analysis, structural equation modeling, and machine learning algorithms have been widely applied in the early prediction of childhood obesity. These approaches enable the effective integration of demographic, behavioral, environmental, and physiological indicators, thereby improving predictive accuracy and practical utility. This review summarizes the application of statistical models and machine learning algorithms in predicting childhood obesity and related metabolic disorders, covering model construction, predictor selection, performance evaluation, and real-world application. Furthermore, current limitations, research gaps, and future directions—including multi-omics integration, long-term cohort validation, and personalized intervention models—are discussed. This review aims to provide a theoretical reference for early warning, precision prevention, and health management of childhood obesity.

Childhood Obesity Metabolic Disorders Statistical Models Machine Learning Risk Prediction Precision Prevention

1. Introduction

Childhood obesity has emerged as one of the most pressing global public health challenges of the 21^st century. Over the past four decades, the prevalence of overweight and obesity among children and adolescents has risen substantially worldwide. A landmark pooled analysis of 2416 population-based studies involving 128.9 million participants reported that the global age-standardised prevalence of obesity increased from 0.7% to 5.6% in girls and from 0.9% to 7.8% in boys between 1975 and 2016 [1]. More recent evidence confirms that this upward trend has continued, with an overall obesity prevalence of 8.5% (95% CI: 8.2 - 8.8) in children and adolescents up to 2023, representing a 1.5-fold increase compared with 2000-2011 [2]. A 2025 commentary in The Lancet further underscores that childhood obesity is now a major public health crisis both nationally and internationally, driven by an imbalance between caloric intake and energy expenditure, and compounded by genetic, behavioural and environmental factors [3].

The public health significance of early prediction cannot be overstated. Childhood obesity is strongly associated with a range of adverse health consequences that track into adulthood, including type 2 diabetes, dyslipidaemia, hypertension, and non-alcoholic fatty liver disease. According to the Global Burden of Disease Study 2021, high body-mass index (BMI) is one of the leading metabolic risk factors globally, with age-standardised disability-adjusted life-year (DALY) rates attributable to high BMI increasing by 15.7% between 2000 and 2021 [4]. These metabolic disorders not only reduce quality of life in childhood but also impose a substantial long-term burden on healthcare systems. Identifying at-risk children before obesity becomes established enables timely, targeted interventions that are more effective and less costly than treating established disease.

Statistical models and machine learning (ML) have demonstrated considerable value in improving early risk prediction. Traditional statistical approaches, such as logistic regression, longitudinal growth curve models, and structural equation modelling, provide interpretable frameworks for quantifying associations and testing causal pathways. Meanwhile, ML algorithms, including random forests, XGBoost, and neural networks, excel at capturing non-linear interactions and handling high-dimensional data from electronic health records, behavioural questionnaires, and multi-omics sources. These techniques facilitate the integration of diverse risk factors and enable accurate individual-level risk stratification. This review systematically summarises the application of statistical models and ML in predicting childhood obesity and related metabolic disorders, discusses current limitations and future directions, and aims to provide a theoretical reference for early warning, precision prevention, and health management.

2. Risk Factors for Childhood Obesity

Childhood obesity arises from a complex interplay of demographic, behavioral, and psychosocial factors. A comprehensive understanding of these risk domains is essential for developing accurate prediction models and targeted interventions.

2.1. Demographic and Socioeconomic Factors

Large-scale meta-analyses have consistently identified sociodemographic characteristics as fundamental determinants. A meta-analysis of over 45 million children from 154 countries reported higher obesity prevalence in high-income countries and those with Human Development Index scores ≥ 0.8 [2]. Parental education, household income, and neighborhood deprivation are strongly associated with childhood weight status [5]. Specifically, Lower socioeconomic status operates through multiple pathways, including reduced access to healthy foods and safe recreational spaces [5].

2.2. Dietary Behaviors and Physical Activity

Unhealthy dietary patterns and insufficient physical activity remain the most direct behavioral drivers. Multiple lifestyle-related predictors in their systematic review of 126 prediction models, noting that sleep duration, sleep quality, and eating speed are significant modifiable factors [6]. Higher birth weight, rapid infant weight gain, and absence of breastfeeding were among the seven strongest early-life risk factors [5]. Furthermore, Children with obesity have significantly higher risks of comorbidities such as hypertension and depression [2], indirectly reinforcing the behavioral-metabolic link.

2.3. Screen Time and Unhealthy Lifestyle

Sedentary behaviors, particularly excessive screen time, have emerged as independent risk factors. Controllable lifestyle factors―including sedentary behavior―are consistently incorporated into high-performing prediction models [6]. Prolonged screen exposure not only displaces physical activity but also increases exposure to food marketing and disrupts sleep patterns [5].

2.4. Psychological and Family Environmental Factors

Family environment and parental behaviors play crucial moderating roles. Russell et al. systematically reviewed studies on disadvantaged populations and found that parenting styles, feeding practices, and family routines are closely intertwined with child eating behaviors and weight trajectories [7]. However, they noted substantial clustering of risk factors by socioeconomic status and ethnicity, making it difficult to isolate independent effects [7]. Parental obesity, maternal prepregnancy BMI, and gestational weight gain are among the strongest predictors [5]. Table 1 summarizes key risk factors and their evidence levels based on the reviewed literature.

The majority of identified risk factors are modifiable and lifestyle-related [6], which underscores their value for both prediction modeling and preventive interventions. However, the clustering of socioeconomic and behavioral risks necessitates multi-domain assessment strategies [7].

Table 1 The key risk factors and evidence levels based on the reviewed literature

Risk Domain	Specific Factors	Strength of Evidence
Demographic/ Socioeconomic	High national HDI, lower household income, neighborhood deprivation	Strong
Early-Life	Higher birth weight, rapid infancy weight gain, and no breastfeeding	Strong (consistent in 12/12-28/31 studies)
Dietary/Activity	Short sleep duration, fast eating speed, and low physical activity	Moderate-Strong
Family Environment	Maternal prepregnancy obesity, parental feeding styles, and family dysfunction	Moderate (evidence clustered)

3. Commonly Used Statistical Models

Statistical models play a fundamental role in understanding the multifactorial etiology of childhood obesity and quantifying the contributions of diverse risk factors. Traditional regression approaches remain widely used for their interpretability and robust inferential properties, while longitudinal methods capture dynamic growth trajectories, and structural equation modeling (SEM) enables examination of complex causal pathways involving latent constructs.

3.1. Traditional Regression Models

Logistic regression is the predominant method for binary outcomes, such as obesity (yes/no) or exceeding the 95^th BMI percentile. Its primary advantage lies in producing odds ratios that are readily interpretable for clinical and policy audiences. A validation study across four demographically diverse U.S. cohorts applied multivariable logistic regression to predict childhood obesity at ages 4 - 6 years using five clinical variables (maternal age, maternal prepregnancy BMI, birth weight z-score, weight-for-age z-score change, and breastfeeding duration) [8]. The models achieved excellent discrimination, with area under the receiver operating characteristic curve (AUC) values ranging from 0.79 to 0.86 across cohorts, and negative predictive values ≥ 80% [8]. This demonstrates that logistic regression, even with a parsimonious set of predictors, can reliably identify high-risk children in routine clinical settings.

Multiple linear regression is commonly used when the outcome is a continuous measure, such as BMI z-score or percent body fat. Schreuder et al. employed linear regression to model excessive gain in BMI z-score between ages 2 and 5 - 7 years, comparing different growth measures (weight, weight-for-length, and BMI) measured at multiple time points during infancy [9]. Their analysis revealed that models incorporating the BMI peak and prepeak velocity achieved notably higher accuracy (derivation AUC: 0.765 - 0.855) than those relying solely on change over time (AUC: 0.706 - 0.795). However, the authors caution that performance degraded substantially upon external validation (AUC dropping by an average of 0.126), underscoring the importance of external validation and the limitations of linear regression when predictor-outcome relationships are non-linear or confounded by unmeasured factors [9].

3.2. Longitudinal Data Analysis

Childhood obesity develops over time, and repeated measurements offer richer insight than cross-sectional data. Linear mixed models (LMMs) account for within-subject correlation, handle irregularly spaced measurements, and accommodate missing data under missing-at-random assumptions. While not explicitly used in the provided longitudinal studies, LMMs underpin more specialized growth modeling approaches.

Growth curve models (GCMs), including latent class growth analysis (LCGA) and group-based trajectory modeling (GBTM), have become indispensable for identifying distinct BMI trajectory subgroups. Michael et al. used latent class growth mixture modeling to identify five BMI z-score trajectories from birth to age 6 years in a Singaporean mother-offspring cohort [10]. Two obesogenic trajectories were identified: an “early-acceleration” pattern characterized by elevated fetal abdominal growth and crossing of the obesity threshold by age 2 years, and a “late-acceleration” pattern approaching the obesity threshold by age 6 years. Both trajectories were associated with elevated cardiometabolic risk markers at age 6, including abdominal fat, liver fat, and insulin resistance [10]. Similarly, Zhou et al. applied GBTM to data from the Ma’anshan birth cohort (n = 2705) to examine physical growth trajectories before age 72 months, finding that children with persistently high BMI, waist circumference, or body fat trajectories had significantly higher risk of early adiposity rebound (ARR; relative risks ranging from 2.83 to 4.17) [11]. Notably, even infants with low BMI trajectories in the first two years who subsequently experienced rapid weight gain were also at elevated risk [11].

A key methodological insight from longitudinal studies is that trajectory-based prediction outperforms single time-point measurements. Huang et al. analyzed data from the Boston Birth Cohort (n = 3029) and identified four distinct BMI percentile trajectories from birth to age 18 years [12]. Using multinomial logistic regression, they demonstrated that BMI percentile trajectories during early childhood (birth to age 1 or 2 years) were superior to a single BMI measurement at age 1 or 2 years for predicting school-age overweight/obesity. Their imputation approach reduced missing data from 36.0% to 10.1%, highlighting a practical solution to a common challenge in longitudinal cohort studies [12]. Table 2 summarizes key methodological features and findings from the longitudinal studies reviewed.

3.3. Structural Equation Model (SEM)

SEM extends traditional regression by modeling relationships among observed and latent variables, estimating direct and indirect effects, and accounting for measurement error. This is particularly valuable for childhood obesity research, where constructs such as “family obesogenic environment” or “healthy lifestyle” cannot be directly observed but are indicated by multiple measured variables.

Table 2 Longitudinal trajectory studies in childhood obesity prediction

Cohort (N)	Method	Key Trajectories Identified	Main Finding
Singapore Mother-Offspring (994)	Latent class growth mixture modeling	Early-acceleration; late-acceleration	Both obesogenic trajectories are linked to cardiometabolic alterations at age 6
Ma’anshan Birth Cohort (2705)	Group-based trajectory modeling	High BMI, waist circumference; body fat trajectories	Early adiposity rebound risk 2.8 - 4.2× higher in high-trajectory groups
Boston Birth Cohort (3029)	Time-series K-means + latent class growth analysis	Early-onset OWO; late-onset OWO; normal stable; low stable	Early trajectories predict school-age OWO better than single-time-point BMI

Path analysis (SEM without latent variables) quantifies mediated pathways. Using cross-sectional data from 861 Argentine schoolchildren, Mendez et al. applied SEM to explore how socioeconomic status influences childhood obesity through health-related habits [13]. Their model showed acceptable fit (CFI = 0.979, RMSEA = 0.048) and revealed that healthy habits―particularly physical activity and maternal nutritional status―fully mediated the relationship between socioeconomic status and child obesity. Socioeconomic status positively influenced healthy habits, which in turn negatively influenced obesity factors (BMI, body fat, waist-to-height ratio) [13]. This causal pathway would have been obscured in a standard regression analysis.

Latent variable SEM enables testing of complex theoretical frameworks. Rahmaty et al. combined latent profile analysis (to derive feeding practice patterns) with SEM to examine associations with preschooler BMI z-score (BMIz) in 437 children [14]. Three feeding practice patterns were identified: Controlling, Balancing, and Regulating. The Regulating pattern (characterized by autonomy-promoting practices) was associated with significantly lower child BMIz (b = −0.09) compared to the Controlling pattern. Higher difficult temperament, higher caregiver BMIz, and caregiver desire for a thinner child were also associated with higher BMIz (all p < 0.05) [14]. Villodres et al. used SEM to examine relationships among screen time, sleep time, physical fitness, Mediterranean diet adherence, eating behaviors, and BMI in 653 Spanish preschoolers [15]. Negative associations emerged between screen time and physical fitness (p < 0.005), screen time and Mediterranean diet adherence (p < 0.005), and Mediterranean diet adherence and BMI (p = 0.033), while pro-intake behaviors were positively associated with BMI (p < 0.005). Multi-group analysis revealed that these relationships differed by child sex and BMI category [15]. Matias et al. employed SEM to test moderation, finding that longer fully breastfeeding duration attenuated the obesity risk associated with high gestational weight gain―a nuanced interaction effect that SEM handles elegantly [16].

Traditional regression models (logistic and linear) remain essential for their interpretability and ease of implementation in clinical risk scoring [8] [9]. Longitudinal methods―particularly growth curve and trajectory analyses―provide unparalleled insight into developmental patterns and enable early identification of children on obesogenic pathways [10]-[12]. SEM offers a powerful framework for testing mediational and moderational hypotheses involving latent constructs, thereby advancing causal understanding of childhood obesity [13]-[16]. The choice among these approaches should be guided by the research question, data structure, and the balance between predictive performance and interpretability.

4. Machine Learning Prediction Algorithms

Machine learning (ML) has emerged as a transformative approach for childhood obesity prediction, offering the ability to model complex, non-linear relationships and high-dimensional data without strong prior assumptions [17] [18]. Unlike traditional regression models that require prespecification of interaction and polynomial terms, ML algorithms automatically capture intricate patterns among demographic, behavioral, and physiological predictors. This section reviews key ML algorithms applied to pediatric obesity prediction, followed by a discussion of model evaluation metrics.

4.1. Decision Trees, Random Forest, and XGBoost

Decision trees partition the feature space into hierarchical if-then rules, producing interpretable flowcharts that map predictor values to weight status categories. However, single trees are prone to overfitting and instability. To address these limitations, ensemble methods combine multiple trees to improve predictive performance and robustness.

Random Forest (RF) builds a large collection of decorrelated decision trees through bootstrap aggregating (bagging) and random feature selection at each split. By averaging predictions across hundreds of trees, RF reduces variance and achieves superior generalization [19]. Liu et al. used RF as both a feature selection method and a predictive model in a population-based study of 3.86 million student-visits, demonstrating excellent stability in identifying key predictors of weight status up to five years in advance [19].

Extreme Gradient Boosting (XGBoost) represents a more advanced ensemble technique that builds trees sequentially, with each new tree correcting the errors of its predecessors. Through gradient-based optimization, regularization parameters, and handling of missing values, XGBoost has consistently outperformed other ML algorithms in pediatric obesity prediction [19] [20]. In a study involving 442,898 primary school students followed through secondary school, XGBoost achieved the highest long-term prediction accuracy (0.72 - 0.74) and macro-AUC (0.83 - 0.86) compared to decision trees, RF, k-nearest neighbors, and support vector machines [20]. The authors concluded that XGBoost enables accurate long-term weight status prediction using easily assessable variables such as weight, height, sex, age, and physical activity frequency.

4.2. Support Vector Machine (SVM)

Support Vector Machine (SVM) constructs an optimal hyperplane that maximizes the margin between different classes. Using kernel functions (e.g., radial basis function, polynomial), SVM can capture non-linear decision boundaries in high-dimensional feature spaces without explicitly transforming the data [17]. This property makes SVM particularly suitable for obesity risk classification when the relationship between predictors and outcomes is complex, but the sample size is moderate. However, SVM provides less interpretable results compared to tree-based methods and requires careful tuning of kernel parameters and regularization costs [18]. In comparative studies, SVM generally performs competitively but is often outperformed by gradient boosting methods in large-scale pediatric datasets [20].

4.3. Neural Networks and Deep Learning

Neural networks (NNs) consist of interconnected layers of artificial neurons that learn hierarchical representations of input data. A standard multilayer perceptron (MLP) with one or more hidden layers can approximate any continuous function, making it highly flexible for obesity prediction [17] [18]. Forte et al. developed a neural network model to classify obesity risk in 654 Portuguese adolescents aged 10 - 19 years using physical fitness variables (aerobic fitness, upper limb strength, sprint time) along with age and sex [21]. Their NN achieved 75% accuracy and an AUC of 64% after K-fold cross-validation, demonstrating moderate predictive capability.

Deep learning (DL) extends neural networks with many hidden layers, enabling automatic feature extraction from raw or high-dimensional data such as electronic health records (EHRs), wearable sensor data, or medical imaging. Gupta et al. built a customized sequential deep learning model using EHR data from 36,191 children aged 0 - 10 years to predict obesity onset within the next three years [22]. Their model achieved AUROC > 0.8 for all age subgroups (most around 0.9) and demonstrated robustness through temporal, geographical, and subgroup validation. Notably, the model relied exclusively on routinely collected EHR variables (e.g., weight, height, BMI records, clinical encounters) without requiring specialized prenatal or lifestyle data, greatly facilitating clinical integration. The authors emphasized that deep learning can serve as an objective screening tool to enable early lifestyle counseling.

4.4. Model Evaluation Metrics

Rigorous evaluation is essential to ensure that ML models generalize beyond their training data. The most common metrics are summarized in Table 3.

AUC (area under the receiver operating characteristic curve) and accuracy are the most frequently reported metrics [18] [19]. In a large-scale study, Liu et al. reported micro-AUCs of 0.96, 0.93, and 0.92 for 1-, 3-, and 5-year predictions, respectively, using XGBoost [19]. However, a critical limitation identified by a recent systematic review and meta-analysis is that while accuracy and AUC are commonly reported [23], no included study assessed model calibration―the agreement between predicted probabilities and observed event rates. The pooled AUC for logistic regression (0.75) and ML (0.76) showed no statistically significant difference, challenging the assumption that ML universally outperforms traditional methods when sample sizes are modest [23].

Table 3 Common performance metrics for childhood obesity prediction models

Metric	Definition	Interpretation
Accuracy	(TP + TN)/(TP + TN + FP + FN)	Overall correct predictions may be misleading with imbalanced classes
AUC (Area under ROC Curve)	The probability that a randomly chosen obese child is ranked higher than a non-obese child	0.5 = random; 0.7 - 0.8 = acceptable; >0.8 = good discrimination
Precision	TP/(TP + FP)	Proportion of predicted positive cases that are truly obese
Recall (Sensitivity)	TP/(TP + FN)	Proportion of actual obese cases correctly identified
F1-Score	2 × (Precision × Recall)/(Precision + Recall)	Harmonic mean of precision and recall
Calibration	Agreement between predicted probabilities and observed frequencies (e.g., Hosmer-Lemeshow test, calibration plots)	Well-calibrated models have predicted risks matching actual event rates

Validation strategies include internal validation (e.g., cross-validation, bootstrapping) and external validation on temporally or geographically distinct cohorts. Gupta et al. exemplified rigorous validation by testing their deep learning model across different time periods, geographic regions, and demographic subgroups [22]. The authors of the meta-analysis emphasized that many ML studies are at high risk of bias due to inadequate validation and recommend that future research prioritize calibration assessment and external validation in diverse populations [23].

Tree-based ensemble methods (especially XGBoost) currently demonstrate the strongest performance in large-scale pediatric obesity prediction, while deep learning offers unique advantages when rich EHR or sensor data are available. SVM remains a viable alternative for smaller datasets. Regardless of algorithm choice, rigorous evaluation, including discrimination, calibration, and external validation, is essential for clinical translation [23].

5. Application in Early Warning and Risk Stratification

Effective translation of predictive models into clinical and public health practice requires three interconnected steps: accurate identification of high-risk populations, design of precision prevention strategies, and integration into health management policies.

5.1. Identification of High-Risk Populations

Systematic reviews have confirmed that prediction models for childhood obesity achieve moderate to good discrimination, with a pooled C-index of 0.769 (95% CI: 0.754 - 0.785) for overweight and 0.835 (95% CI: 0.792 - 0.879) for obesity in training sets [6]. Importantly, most predictive factors are modifiable lifestyle behaviours (e.g., sleep duration, eating speed), making them actionable in routine screening. While machine learning (ML) methods are increasingly popular, a recent meta-analysis found no statistically significant difference in area under the curve (AUC) between ML (pooled AUC: 0.76) and logistic regression (pooled AUC: 0.75) for obesity risk prediction [23]. This suggests that simpler, more interpretable models may be equally effective for initial risk stratification when calibration and external validation are properly conducted.

5.2. Precision Prevention and Targeted Intervention

Once high-risk children are identified, interventions should be tailored to individual and contextual drivers. For example, longitudinal analyses of BMI trajectories among children with obesity show that those living in areas with a higher area deprivation index (ADI) have significantly greater odds of following an increasing BMI trajectory [24]. This finding supports targeting not only individual behaviours but also neighborhood-level social determinants―particularly in rural settings where concentrated disadvantage is more common. Predictive models can thus guide resource allocation, referring high-risk children from deprived areas to multi-component, family-centered interventions, while lower-risk children may benefit from universal primary prevention.

5.3. Health Management and Policy Implications

At the policy level, digital health strategies provide a framework for embedding risk prediction into routine child health surveillance. Comparative policy analyses highlight the World Health Organization’s Global Strategy on Digital Health 2020-2025 as a key blueprint, though many national plans still lack elements such as knowledge management and health equity promotion [25]. Integrating validated prediction models into electronic health records―coupled with clear protocols for referral and follow-up―could bridge the gap between risk identification and effective intervention. As summarised in Table 4, successful implementation requires alignment of predictive accuracy, actionable risk factors, and supportive digital health policies.

Early warning and risk stratification for childhood obesity are most effective when evidence-based prediction models are coupled with targeted interventions addressing both individual and ecological determinants, supported by coherent digital health policies.

Table 4 Key considerations for translating childhood obesity prediction models into practice

Component	Key Evidence	Policy/Action Implication
High-Risk Identification	Pooled C-index 0.769 - 0.835; no ML advantage over logistic regression	Use simple, validated tools with calibration; prioritise external validation
Precision Prevention	Area deprivation index predicts increasing BMI trajectory	Combine individual counselling with community-level interventions in high-deprivation areas
Health Management	WHO Digital Health Strategy 2020-2025	Embed models into EHRs; regularly update national digital health strategies

6. Limitations and Challenges

Despite promising advances, the application of statistical models and machine learning (ML) in childhood obesity prediction faces several critical limitations that hinder clinical translation and generalizability.

6.1. Small Sample Size and Single-Center Data

A substantial proportion of prediction studies suffer from small, non-representative samples and single-center data collection. In a systematic review and meta-analysis comparing logistic regression and ML for obesity risk prediction, 75% of included studies were at high risk of bias―predominantly due to inadequate sample sizes, lack of external validation, and no calibration assessment [23]. Single-center datasets often capture population-specific demographic and environmental patterns, leading to models that fail to generalize across diverse ethnic, socioeconomic, and geographic contexts [17].

6.2. Lack of Long-Term Follow-Up Cohorts

Most prediction models focus on short-term outcomes (1 - 3 years), yet childhood obesity is a chronic condition whose metabolic consequences―such as type 2 diabetes and fatty liver disease―emerge over decades. While ML can identify key predictors for 1-, 3-, and 5-year weight status, the number of required features increased with prediction horizon (from 6 to 13 predictors) [19], underscoring the need for long-term longitudinal data. However, such cohorts remain scarce due to high costs, attrition, and extended follow-up periods. Consequently, few models have been externally validated for predicting adolescent or young adult metabolic outcomes using early childhood predictors.

6.3. Poor Interpretability of Complex Models

Deep learning and ensemble methods, despite superior predictive performance reported in some studies, operate as “black boxes” [26]. Clinicians and parents are unlikely to trust or act upon predictions without understanding the driving risk factors for an individual child. The lack of model transparency―especially in deep neural networks―remains a major barrier to clinical adoption [17]. While post-hoc explainability techniques (e.g., SHAP) are emerging, they are not yet standard practice in pediatric obesity research, and their reliability in high-stakes health decisions is debated

7. Future Directions

Advancing the prediction of childhood obesity and related metabolic disorders requires parallel progress in four interconnected domains: multi-omics integration, large-scale longitudinal cohorts, personalized intelligent interventions, and federated multi-center collaboration. Each direction addresses specific limitations of current models and opens new avenues for precision prevention.

1) Multi-omics integration prediction: Integrating genomic, gut microbiome, and metabolomic data promises to uncover early biological pathways underlying obesity risk. Aparicio et al. constructed a quadripartite network linking SNPs in the FHIT gene (associated with obesity and type 2 diabetes) to microbial taxa, plasma metabolites, and BMI in children from 6 months to 8 years of age, identifying novel risk markers for insulin resistance [27]. Complementing this, Rafiq et al. employed DIABLO integrative analysis in a South Asian birth cohort, revealing that Akkermansia and GABA were negatively associated with early childhood overweight/obesity, while Lactobacillus and glutamic acid showed positive associations [28]. These multi-omics signatures enhance predictive accuracy beyond single-layer models.

2) Large-scale longitudinal cohort study: Robust prediction requires large, diverse, prospectively followed populations. Singh et al. leveraged the UK Millennium Cohort Study (over 10,000 children) to predict teenage obesity using earlier childhood measurements, achieving 77% sensitivity and specificity with easily obtainable features suitable for clinical and non-clinical settings [5]. Extending such cohort designs across multiple geographic and ethnic groups will improve generalizability.

3) Personalized intelligent intervention model: Beyond risk prediction, the next frontier is adaptive, individualized interventions. Explainable AI (XAI) methods are critical to make black-box models interpretable for clinicians and families. Khater et al. demonstrated a Random Forest model free of BMI parameters, achieving 86.5% accuracy for obesity prediction, using SHAP and partial dependence plots to reveal key lifestyle drivers such as meal frequency and technology usage [29]. Shen et al. applied SHAP values to an AdaBoost model (89.2% accuracy) to demystify decision-making, enabling causal insights into eating habits and physical condition [30]. These XAI techniques can power just-in-time adaptive interventions tailored to each child’s modifiable risk profile.

4) Federated learning and multi-center collaboration: Privacy concerns and population heterogeneity limit data sharing across institutions. FETA, a federated transfer learning framework, integrates heterogeneous data from multiple healthcare sites without sharing individual-level records [31]. Applied to eMERGE Network data for extreme obesity genetic risk prediction, FETA outperformed models trained on target-only or source-only data, reducing performance disparities in underrepresented populations [31]. This approach enables large-scale, privacy-preserving model development while accommodating population diversity.

Combining multi-omics discovery with large-scale longitudinal validation, interpretable AI-driven personalization, and federated collaborative networks will transform childhood obesity prediction from population-level risk scores to equitable, precision-focused early warning systems.

8. Conclusion

Childhood obesity and its metabolic consequences pose a persistent global health challenge, but the growing toolkit of statistical models and machine learning algorithms offers unprecedented opportunities for early prediction and precision prevention. Traditional regression models, longitudinal methods, and SEM remain valuable for understanding etiological pathways and generating interpretable risk scores. Machine learning approaches―particularly random forests, XGBoost, and neural networks―consistently achieve superior predictive accuracy by capturing non-linear and interactive effects inherent in obesity development. Future progress depends on overcoming current limitations: small, non-representative datasets; lack of long-term follow-up; model interpretability; and integration of multi-omics data. Collaborative, multi-center, and privacy-preserving frameworks such as federated learning will be essential to develop generalizable, clinically useful tools. Ultimately, the goal is not simply better prediction, but actionable risk stratification that leads to earlier, more effective, and personalized interventions―reversing the trajectory of childhood obesity before metabolic disorders take root.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References1

Abarca-Gómez, L., Abdeen, Z.A., Hamid, Z.A., Abu-Rmeileh, N.M., Acosta-Cazares, B., Acuin, C., et al. (2017) Worldwide Trends in Body-Mass Index, Underweight, Overweight, and Obesity from 1975 to 2016: A Pooled Analysis of 2416 Population-Based Measurement Studies in 128·9 Million Children, Adolescents, and Adults. The Lancet, 390, 2627-2642. https://doi.org/10.1016/s0140-6736(17)32129-3

Zhang, X., Liu, J., Ni, Y., Yi, C., Fang, Y., Ning, Q., et al. (2024) Global Prevalence of Overweight and Obesity in Children and Adolescents: A Systematic Review and Meta-Analysis. JAMA Pediatrics, 178, 800-813. https://doi.org/10.1001/jamapediatrics.2024.1576

The Lancet (2025) Childhood Obesity: A Global Health Crisis. The Lancet, 406, 1193. https://doi.org/10.1016/s0140-6736(25)01906-3

Brauer, M., Roth, G.A., Aravkin, A.Y., Zheng, P., Abate, K.H., Abate, Y.H., et al. (2024) Global Burden and Strength of Evidence for 88 Risk Factors in 204 Countries and 811 Subnational Locations, 1990-2021: A Systematic Analysis for the Global Burden of Disease Study 2021. The Lancet, 403, 2162-2203. https://doi.org/10.1016/s0140-6736(24)00933-4

Blaauwendraad, S.M., Kamphuis, A.S.J., Ruiz-Ojeda, F.J., Brandimonte-Hernández, M., Flores-Ventura, E., Abrahamse-Berkeveld, M., et al. (2026) Risk Factors in the First 1000 Days of Life Associated with Childhood Obesity: A Systematic Review and Risk Factor Quality Assessment. Obesity Reviews, 27, e70025. https://doi.org/10.1111/obr.70025

Gou, H., Song, H., Tian, Z. and Liu, Y. (2024) Prediction Models for Children/Adolescents with Obesity/Overweight: A Systematic Review and Meta-analysis. Preventive Medicine, 179, Article ID: 107823. https://doi.org/10.1016/j.ypmed.2023.107823

Russell, C.G., Taki, S., Laws, R., Azadi, L., Campbell, K.J., Elliott, R., et al. (2016) Effects of Parent and Child Behaviours on Overweight and Obesity in Infants and Young Children from Disadvantaged Backgrounds: Systematic Review with Narrative Synthesis. BMC Public Health, 16, Article No. 151. https://doi.org/10.1186/s12889-016-2801-y

Funatake, C.J., Armendáriz, M., Rauch, S., Eskenazi, B., Nomura, Y., Hivert, M., et al. (2024) Validation of Variables for Use in Pediatric Obesity Risk Score Development in Demographically and Racially Diverse United States Cohorts. The Journal of Pediatrics, 275, Article ID: 114219. https://doi.org/10.1016/j.jpeds.2024.114219

Schreuder, A., Corpeleijn, E. and Vrijkotte, T. (2023) Modelling Individual Infancy Growth Trajectories to Predict Excessive Gain in BMI Z-Score: A Comparison of Growth Measures in the ABCD and GECKO Drenthe Cohorts. BMC Public Health, 23, Article No. 2428. https://doi.org/10.1186/s12889-023-17354-4

Michael, N., Gupta, V., Fogel, A., Huang, J., Chen, L., Sadananthan, S.A., et al. (2023) Longitudinal Characterization of Determinants Associated with Obesogenic Growth Patterns in Early Childhood. International Journal of Epidemiology, 52, 426-439. https://doi.org/10.1093/ije/dyac177

Zhou, J., Teng, Y., Zhang, S., Yang, M., Yan, S., Tao, F., et al. (2023) Birth Outcomes and Early Growth Patterns Associated with Age at Adiposity Rebound: The Ma’anshan Birth Cohort (MABC) Study. BMC Public Health, 23, Article No. 2405. https://doi.org/10.1186/s12889-023-17236-9

Huang, W., et al. (2023) Defining Longitudinal Trajectory of Body Mass Index Percentile and Predicting Childhood Obesity: Methodologies and Findings in the Boston Birth Cohort. Precision Nutrition, 2, e00037.

Mendez, I., Fasano, M.V. and Orden, A.B. (2023) Exploring Factors Associated with Obesity in Argentinian Children Using Structural Equation Modeling. Cadernos de Saúde Pública, 39, e00087822. https://doi.org/10.1590/0102-311xen087822

Rahmaty, Z., Johantgen, M.E., Storr, C.L., Wang, Y. and Black, M.M. (2023) Preschoolers BMI: Associations with Patterns of Caregivers’ Feeding Practices Using Structural Equation Models. Childhood Obesity, 19, 169-178. https://doi.org/10.1089/chi.2022.0026

Villodres, G., Padial-Ruz, R., Salas-Montoro, J. and Muros, J. (2024) Lifestyle Behaviours in Pre-Schoolers from Southern Spain—A Structural Equation Model According to Sex and Body Mass Index. Nutrients, 16, Article 3582. https://doi.org/10.3390/nu16213582

Matias, S.L., Anderson, C.E. and Koleilat, M. (2023) Breastfeeding Moderates Childhood Obesity Risk Associated with Prenatal Exposure to Excessive Gestational Weight Gain. Maternal & Child Nutrition, 19, e13545. https://doi.org/10.1111/mcn.13545

Huang, L., Huhulea, E.N., Abraham, E., Bienenstock, R., Aifuwa, E., Hirani, R., et al. (2025) The Role of Artificial Intelligence in Obesity Risk Prediction and Management: Approaches, Insights, and Recommendations. Medicina, 61, Article 358. https://doi.org/10.3390/medicina61020358

Azmi, S., Kunnathodi, F., Alotaibi, H.F., Alhazzani, W., Mustafa, M., Ahmad, I., et al. (2025) Harnessing Artificial Intelligence in Obesity Research and Management: A Comprehensive Review. Diagnostics, 15, Article 396. https://doi.org/10.3390/diagnostics15030396

Liu, H., Leng, Y., Wu, Y., Chau, P.H., Chung, T.W.H. and Fong, D.Y.T. (2024) Robust Identification Key Predictors of Short- and Long-Term Weight Status in Children and Adolescents by Machine Learning. Frontiers in Public Health, 12, Article 1414046. https://doi.org/10.3389/fpubh.2024.1414046

Liu, H., Wu, Y., Chau, P.H., Chung, T.W.H. and Fong, D.Y.T. (2024) Prediction of Adolescent Weight Status by Machine Learning: A Population-Based Study. BMC Public Health, 24, Article No. 1351. https://doi.org/10.1186/s12889-024-18830-1

Forte, P., Encarnação, S., Monteiro, A.M., Teixeira, J.E., Hattabi, S., Sortwell, A., et al. (2023) A Deep Learning Neural Network to Classify Obesity Risk in Portuguese Adolescents Based on Physical Fitness Levels and Body Mass Index Percentiles: Insights for National Health Policies. Behavioral Sciences, 13, Article 522. https://doi.org/10.3390/bs13070522

Gupta, M., Eckrich, D., Bunnell, H.T., Phan, T.T. and Beheshti, R. (2024) Reliable Prediction of Childhood Obesity Using Only Routinely Collected EHRs May Be Possible. Obesity Pillars, 12, Article ID: 100128. https://doi.org/10.1016/j.obpill.2024.100128

Boakye, N.F., O’Toole, C.C., Jalali, A. and Hannigan, A. (2025) Comparing Logistic Regression and Machine Learning for Obesity Risk Prediction: A Systematic Review and Meta-Analysis. International Journal of Medical Informatics, 199, Article ID: 105887. https://doi.org/10.1016/j.ijmedinf.2025.105887

Barbour, Z., Mojica, C., Alvarez, H.O. and Foster, B.A. (2024) Socio-Ecologic Influences on Weight Trajectories among Children with Obesity Living in Rural and Urban Settings. Childhood Obesity, 20, 624-633. https://doi.org/10.1089/chi.2023.0193

Holl, F., Kircher, J., Hertelendy, A.J., Sukums, F. and Swoboda, W. (2024) Tanzania’s and Germany’s Digital Health Strategies and Their Consistency with the World Health Organization’s Global Strategy on Digital Health 2020-2025: Comparative Policy Analysis. Journal of Medical Internet Research, 26, e52150. https://doi.org/10.2196/52150

Yi, X., He, Y., Gao, S. and Li, M. (2024) A Review of the Application of Deep Learning in Obesity: From Early Prediction Aid to Advanced Management Assistance. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 18, Article ID: 103000. https://doi.org/10.1016/j.dsx.2024.103000

Aparicio, A., et al. (2023) Genotype-Microbiome-Metabolome Associations in Early Childhood, and Their Link to BMI and Childhood Obesity. medRxiv.

Rafiq, T., Stearns, J.C., Shanmuganathan, M., Azab, S.M., Anand, S.S., Thabane, L., et al. (2023) Integrative Multi-omics Analysis of Infant Gut Microbiome and Serum Metabolome Reveals Key Molecular Biomarkers of Early Onset Childhood Obesity. Heliyon, 9, e16651. https://doi.org/10.1016/j.heliyon.2023.e16651

Khater, T., Tawfik, H. and Singh, B. (2024) Explainable Artificial Intelligence for Investigating the Effect of Lifestyle Factors on Obesity. Intelligent Systems with Applications, 23, Article ID: 200427. https://doi.org/10.1016/j.iswa.2024.200427

Shen, J., Li, S., Li, X.L., Wang, Y., Monday, H.N. and Nneji, G.U. (2024) Explainable AI for the Prediction and Estimation of Obesity Levels Using Machine Learning Models. 2024 4th International Conference on Electronic Information Engineering and Computer Science (EIECS), Yanji, 27-29 September 2024, 709-713. https://doi.org/10.1109/eiecs63941.2024.10800143

Li, S., Cai, T. and Duan, R. (2023) Targeting Underrepresented Populations in Precision Medicine: A Federated Transfer Learning Approach. The Annals of Applied Statistics, 17, 2970-2992. https://doi.org/10.1214/23-aoas1747