Clin Chem. 2025 Jul 18:hvaf074. doi: 10.1093/clinchem/hvaf074. Online ahead of print.
ABSTRACT
BACKGROUND: Iron deficiency (ID) is a prevalent global health issue with a major impact on well-being. Early detection of ID is crucial but challenging due to its nonspecific symptoms and the limitations of traditional diagnostic tests, which are impractical for large-scale screening. This study proposes a machine learning (ML) approach using complete blood count (CBC) data and cell population data (CPD) for detecting ID in the general population.
METHODS: We retrospectively collected patient data from 3 hospitals to develop and validate 5 ML models using CBC, CPD, and demographic information. After identifying the best-performing model, we evaluated the impact of various feature sets and also assessed model performance across different subgroups to ensure robustness in diverse populations. The model was also deployed and integrated into clinical workflows.
RESULTS: We retrospectively enrolled 9608 adult patients across emergency, inpatient, and outpatient departments from 3 hospitals, and prevalence of ID ranged from 17.4% to 19.6%. The ML model achieved an area under the receiver operating characteristic curve (AUROC) exceeding 0.94 and a precision-recall curve values (AUPRC) exceeding 0.83 during validation. After integration into the clinical system, the model maintained stable real-world performance, with an AUROC of 0.948 and an AUPRC of 0.854. Subgroup analysis showed lower performance in male and nonanemic populations.
CONCLUSIONS: Our study highlights the effectiveness of a ML model integrating CPD with CBC parameters for screening ID in the general population. Leveraging routine blood data without requiring biochemical tests, the model enables efficient and consistent ID screening across cohorts.
PMID:40679925 | DOI:10.1093/clinchem/hvaf074