At Kaiser Permanente, I worked for 12 weeks as part of the Care Improvement Research Team developing a model to improve diagnosis of patients with Chronic Obstruction Pulmonary Disease (COPD) using Electronic Health Records (EHR). COPD is a progressive lung disease that includes emphysema, chronic bronchitis, refractory (non-reversible) asthma, and some forms of bronchiectasis. Currently, COPD is the fourth most frequent cause of death worldwide, and affects around 16 million Americans. However, it is estimated that 50% of people with COPD are not diagnosed and do not receive treatment. [1]
The “Gold Standard” for COPD diagnosis is spirometry, but only about 20% of patients have this test in their EHR, and those who lack it are likely not missing the test at random (MNAR). Therefore, there is a large clinical need to improve diagnosis of patients with COPD using complex data relationships available in their EHR.
I used a Bayesian Latent Variable Analysis to describe the relationships among observed and missing traits in EHR to the latent phenotype of COPD. The model relates medications, comorbidities, clinical ICD-9 diagnostic codes, spirometry tests, risk factors, and health seeking behaviors to a person’s latent COPD phenotype. This method provides an individualized probability of having COPD, which retains the uncertainty in their disease phenotype, as values close to 50% indicate uncertainty. Markov Chain Monte Carlo (MCMC) methods via the R package runjags were used.
Beyond the model, I also built a function that allows prediction of new patients’ phenotypes using the estimated model. The code for the model and the function are provided on my Github repository. I have provided a document that I intend for researchers to use as a learning tool and reference for personal adaptation. As I mentioned in my opening page, public health succeeds when we all succeed; reach out if you have questions.
Reference Document:
A Bayesian & EHR-derived Latent Phenotyping model for COPD
Suggested citation:
McGreevy, K.M., Shen, E.. 2019. “A Bayesian & EHR-derived Latent Phenotyping model for COPD”. Research and Evaluation Department, Kaiser Permanente Southern California.
1. The Top 10 Causes of Death. 2019. World Health Organization Factsheet.
http://www.who.int/mediacentre/factsheets/fs310/en/.
