로고

로고

Research

연구 정보

Research Project (승인과제목록)

KNN 연구 요약서

Title Unsupervised clustering of growth and neurodevelopmental outcomes in VLBW infants: association with neonatal determinants
Author 정성훈
작성자 정성훈
Background With advancements in neonatal intensive care, the survival rates of very low birth weight (VLBW) infants have improved significantly. However, long-term morbidities, such as extrauterine growth restriction and neurodevelopmental delay, remain major challenges. Existing studies often analyze growth and neurodevelopment separately or focus on predicting outcomes based on single neonatal diseases, such as bronchopulmonary dysplasia (BPD), intraventricular hemorrhage (IVH), or periventricular leukomalacia (PVL), which may not fully capture the complex, multidimensional nature of long-term prognosis. Unsupervised machine learning methods offer a data-driven approach to identify latent clinical subgroups (phenotypes) based on comprehensive outcomes. By concurrently analyzing growth and neurodevelopmental data, we can identify distinct phenotypes, such as "growth failure only," "neurodevelopmental delay only," or "global delay," and trace their origins to specific neonatal risk factors. This approach can facilitate early identification of high-risk groups and the development of tailored intervention strategies.
Aim / Hypothesis The primary aim of this study is to identify distinct phenotypic subgroups in VLBW infants using unsupervised clustering based on growth parameters (weight, height, head circumference) and neurodevelopmental assessment (Bayley Scales) during early childhood (at 18–24 months corrected age and/or 3 years of age). We further aim to investigate the association between these identified clusters and neonatal determinants, including morbidities and therapeutic interventions during NICU hospitalization.
Inclusion Criteria 1) VLBW infants (<1,500 g) registered in the KNN from 2013 to 2022. 2) Completed follow-up with available growth and neurodevelopmental data (Bayley Scales) at either 18–24 months corrected age or 3 years of age (or both).
Exclusion Criteria 1) Major congenital anomalies or chromosomal disorders. 2) Death before discharge or during the follow-up period. 3) Infants with missing data on key clustering variables at the designated follow-up time points.
Study Design Statistical methods In the data pre-processing phase, growth parameters will be converted into Z-scores based on WHO Child Growth Standards, and BSID-II and BSID-III scores will be standardized to Z-scores within each test version to facilitate integration. Unsupervised clustering, using hierarchical or K-means algorithms, will be performed primarily on data from 18–24 months corrected age due to typically higher follow-up rates. A secondary analysis using 3-year outcome data will also be conducted, provided that data quality and quantity allow, to assess the stability of phenotypes and long-term trajectories. For statistical analysis, baseline characteristics and neonatal determinants will be compared across the identified clusters using ANOVA or Kruskal-Wallis tests for continuous variables and Chi-square or Fisher’s exact tests for categorical variables. Finally, multinomial logistic regression will be employed to identify independent risk factors associated with adverse outcome clusters.
Primary Outcomes 1) Identification and characterization of distinct growth and neurodevelopmental phenotypes (clusters) during early childhood. 2) Determination of determinants, including maternal characteristics, perinatal factors, and neonatal morbidities, associated with each identified cluster.
Secondary Outcomes and Definitions 1) Comparison of the prevalence of Neurodevelopmental Impairment (NDI), including Cerebral Palsy (CP), blindness, and deafness, across clusters. 2) Comparison of growth restriction rates (<10th percentile) across clusters. 3) Assessment of longitudinal changes in cluster assignment between 18–24 months and 3 years (if data permits). Outcome Definitions: Growth restriction: Z-score < -1.28 (<10th percentile). Neurodevelopmental delay: BSID scores < -2 SD (or <70). Cerebral palsy (CP): GMFCS level ≥ 2 (or ≥ 1 based on analysis). Neonatal determinants: Including BPD (moderate to severe), Sepsis (culture-proven), IVH (grade ≥ III), PVL, and NEC (stage ≥ II).
Protocols 1. Data collection 1) Clinical data will be extracted from the KNN registry. 2) Growth and neurodevelopmental variables at 18–24 months (and 3 years) used for clustering; maternal, perinatal, and neonatal variables used for post-hoc comparison and risk factor analysis. 2. Data standardization and pre-processing 1) Growth parameters converted to Z-scores using WHO Child Growth Standards. 2) Bayley scores standardized to Z-scores within each test version (Internal Standardization). 3) Missing values for covariates will be handled using multiple imputation; patients with missing clustering variables will be excluded. 3. Machine learning analysis 1) Unsupervised clustering (Hierarchical clustering or K-means clustering). 2) Cluster number determined by dendrogram and silhouette width. 3) Visualization by radar charts and longitudinal growth trajectories. 4) Post hoc comparisons for cluster characterization and multinomial logistic regression for risk factor identification. 4. Ethical considerations 1) IRB approval obtained prior to data analysis. 2) All data fully anonymized; no identifiable information used.
Funding None