.abstract img { width:300px !important; height:auto; display:block; text-align:center; margin-top:10px } .abstract { overflow-x:scroll } .abstract table { width:100%; display:block; border:hidden; border-collapse: collapse; margin-top:10px } .abstract td, th { border-top: 1px solid #ddd; padding: 4px 8px; } .abstract tbody tr:nth-child(even) td { background-color: #efefef; } .abstract a { overflow-wrap: break-word; word-wrap: break-word; }
A6170 - Discovery of Novel COPD Subtypes Via Cluster Analysis of Clinical Features Extracted from Electronic Health Records
Author Block: M. Pikoula1, J. K. Quint2, K. Direk1, F. Nissen3, A. Gonzalez-Izquierdo1, H. Hemingway1, L. Smeeth3, S. Denaxas1; 1Institute of Health Informatics, UCL, London, United Kingdom, 2National Heart and Lung Institute, Imperial College London, London, United Kingdom, 3London School of Hygiene and Tropical Medicine, London, United Kingdom.
Rationale
Notable heterogeneity exists in the clinical presentation of Chronic Obstructive Pulmonary Disease (COPD) patients but little consensus exists on the definitions of COPD subtypes. The use of machine learning approaches, such as clustering, can enable the discovery of novel subtypes and guide the characterization of their clinical manifestations. We sought to identify COPD subtypes by applying an unsupervised machine learning method on an unselected COPD population derived from phenotypically rich electronic health records (EHR).
Methods
Our study used primary care EHR from the Clinical Practice Research Datalink (CPRD) which offers anonymized, longitudinal data from the UK and has been shown to be representative. COPD patients were identified using a previously-validated, rule-based phenotyping algorithm. We extracted clinical feature values for: a) demographics (age, gender), b) health behaviour (smoking status), c) disease severity (GOLD stage, acute exacerbations), d) physiological measures (eosinophil counts, spirometry results), e) treatment (COPD medication) and f) comorbidities (heart failure, asthma, atopy, GERD, hypertension, anxiety, depression), using previously-validated phenotyping algorithms. Patients were clustered using the K-modes algorithm, an extension of the K-means clustering algorithm which groups patients based on the similarity of their clinical features. The resulting clusters and optimal number (k) of clusters were evaluated using a) variables not included as clustering features (rate of acute exacerbation for COPD (AECOPD), diagnosis of asthma) and b) internal metrics of cluster cohesion and separation.
Results
A total of 34,662 COPD patients were included in the study between September 1987 and March 2016. The resulting clusters differed significantly in AECOPD rates, GOLD stages, eosinophil counts, gender and smoking status. Notably, in a k=5 clusters solution, two clusters with equal and lower than average AECOPD rates were identified. The first cluster was predominantly female, ex-smokers, with higher FEV1 values whereas the second cluster was younger, predominantly male, current smokers, with lower FEV1 values. The latter was also the cluster of patients least likely to have a diagnosis for asthma and atopy. The cluster with significantly higher mean AECOPD rate, despite relatively healthy FEV1 values, was also the one with patients (predominantly female) most likely to also have a diagnosis of asthma and high eosinophil counts.
Conclusions
By combining a diverse set of clinical features with unsupervised machine learning (k-modes) methods we identified clusters with marked differences in asthma prevalence and AECOPD. These findings suggest that further research is merited to uncover the underlying pathophysiological differences in COPD presentations.