Clin Chem Lab Med. 2025 May 28. doi: 10.1515/cclm-2025-0491. Online ahead of print.
ABSTRACT
OBJECTIVES: Machine learning (ML) models, using laboratory data, support early sepsis prediction. However, analytical bias in laboratory measurements can compromise their performance and validity in real-world settings. We aimed to evaluate how analytically acceptable bias may affect the validity and generalizability of ML models trained on laboratory data.
METHODS: A support vector machine model (SVM) for sepsis prediction was developed using complete blood count and erythrocyte sedimentation rate data from outpatients (CS, n=104) and patients from acute inflammatory status wards (SS, n=107). Twenty-six combinations were derived by white blood cells (WBC), platelets (PLT), and erythrocyte sedimentation rate (ESR) biases from analytical performance specifications (APS). The diagnostic performances of the 26 conditions tested were compared to the original dataset.
RESULTS: SVM performance of the original dataset was AUC 90.6 % [95 %CI: 80.6-98.7 %]. Minimum, desirable and optimum acceptable biases for WBC were 7.7 , 5.1 and 2.6 %, respectively, for PLT were 6.7 , 4.5 and 2.2 %, respectively and for ESR were 31.6 , 21.1 and 10.5 %, respectively. Across all conditions, AUC varied from 89.8 % [95 %CI: 79.0-97.7 %] (for PLT bias -6.7 %), to 89.5 % [95 %CI: 79.1-98.0 %] (for ESR Bias +31.6 %) to 90.4 % [95 %CI: 79.3-98.4 %] (for WBC Bias -5.1 %). Using a combination of biases, the lowest AUC was 87.8 % [95 %CI: 75.9-96.6 %]. No statistically significant differences were observed for AUC (p>0.05).
CONCLUSIONS: Bias can influence model performance depending on the parameters and their combinations. Developing new validation strategies to assess the impact of analytical bias on laboratory data in ML models could improve their reliability.
PMID:40440484 | DOI:10.1515/cclm-2025-0491