로고

로고

Research

연구 정보

Research Project (승인과제목록)

KNN 연구 요약서

Title Identification of Clinical Subgroups in Preterm Infants with Retinopathy of Prematurity and Their Long-Term Outcomes Using Unsupervised Machine Learning
Author Chae Young Kim, Sung-Hoon Chung
작성자 정성훈
Background Retinopathy of Prematurity (ROP) is a major cause of preventable childhood blindness worldwide. Despite significant advancements in neonatal intensive care, the incidence of ROP remains considerable, especially among very low birth weight (VLBW) infants. Existing classification systems primarily rely on ophthalmologic findings and may not adequately reflect the clinical heterogeneity or underlying risk profiles of affected infants. Unsupervised machine learning methods, such as hierarchical clustering, offer a data-driven approach to identifying latent clinical subgroups, which could lead to more individualized monitoring and intervention strategies. Furthermore, investigating long-term outcomes such as neurodevelopmental impairment and visual disability can provide additional insights into prognostic differences between subgroups.
Aim / Hypothesis To identify clinically meaningful subgroups among preterm infants diagnosed with ROP using hierarchical clustering based on clinical variables available prior to ROP diagnosis, and to evaluate differences in long-term outcomes, including neurodevelopment and vision, among the identified subgroups. We hypothesize that preterm infants diagnosed with ROP exhibit clinically meaningful heterogeneity that can be uncovered using unsupervised machine learning methods. These phenotypic subgroups will differ in early clinical profiles and long-term outcomes.
Inclusion Criteria 1) Infants admitted to participating NICUs from 2013 to 2021. 2) Diagnosed with ROP.
Exclusion Criteria 1) Infants who died before undergoing ROP evaluation. 2) Infants with missing or incomplete ROP-related data. 3) Infants with major congenital anomalies or inborn errors of metabolism. 4) Infants with incomplete medical records.
Study Design Statistical methods This study uses data from the Korean Neonatal Network (KNN), an ongoing registry of preterm infants with birth weight <1500 g or gestational age <32 weeks since 2013. For this analysis, infants diagnosed with any stage of ROP will be included. Unsupervised clustering will be performed using clinical variables available prior to ROP diagnosis, including demographic characteristics, antenatal exposures, and early morbidities. Gestational age and birth weight will be excluded from clustering to avoid dominating effects but will be considered in post-clustering comparisons. Data preprocessing includes one-hot encoding for categorical variables and standardization for continuous variables. Missing data will be addressed using multiple imputation. Hierarchical clustering will be performed using Ward’s linkage and Euclidean distance. The number of clusters will be selected based on dendrogram structure and silhouette scores. PCA may assist in visualization. After clustering, differences in clinical features and outcomes will be analyzed.
Primary Outcomes 1) Identification and characterization of distinct clinical clusters among preterm infants diagnosed with ROP 2) Determination of early clinical factors associated with each identified cluster
Secondary Outcomes and Definitions 1) Comparison of early morbidities (e.g., BPD, NEC, sepsis) across clusters. 2) Evaluation of antenatal/perinatal factors influencing ROP development and severity. 3) Assessment of long-term outcomes, including neurodevelopmental impairment and visual impairment, primarily at corrected 18–24 months of age, with additional analysis at 3 years of age if sufficient data are available. Outcome Definitions ROP: Defined by the presence and stage as diagnosed by ophthalmologic examination, based on the international classification of ROP. Sepsis: Defined as culture-proven bloodstream infection with accompanying clinical symptoms and/or treatment. BPD: Defined based on oxygen requirement at 36 weeks postmenstrual age (PMA), according to NICHD 2001 criteria. NEC: Defined as stage II or higher per modified Bell’s criteria. Neurodevelopmental impairment & visual disability: Defined according to standardized follow-up protocols of KNN.
Protocols 1. Data collection 1) Clinical data will be extracted from the KNN registry. 2) Variables prior to ROP diagnosis used for clustering; follow-up data at 18–24 months and 3 years used for long-term outcome analysis. 2. Data standardization and preprocessing 1) One-hot encoding for categorical variables. 2) Standardization for continuous variables. 3) Missing values will be handled using multiple imputation techniques or substitutions. 3. Machine learning analysis 1) Hierarchical clustering (Ward’s method, Euclidean distance). 2) Cluster number determined by dendrogram and silhouette width. 3) Visualization by dendrograms and PCA. 4) Post hoc comparisons for cluster characterization. 4. Ethical considerations 1) IRB approval obtained prior to data analysis. 2) All data fully anonymized; no identifiable information used.
Funding None