Automated Informatics Tool May Streamline Genetic Diagnoses

Published on in CHOP News

A scientist in Children's Hospital of Philadelphia's (CHOP) Department of Biomedical and Health Informatics (DBHi) and the Raymond G. Perelman Center for Cellular and Molecular Therapeutics developed a new software tool that rapidly extracts phenotype information from electronic health records (EHRs) to facilitate genetic diagnoses. The tool, EHR-Phenolyzer, identifies relevant information from a patient’s medical history narrative in EHRs and translates the data into standardized terms that the tool then correlates with genes that underlie often-puzzling genetic diseases.

Kai Wang, PhD Kai Wang, PhD Kai Wang, PhD, a data scientist at DBHi, co-led a study with Chunhua Weng, PhD, of the Data Science Institute and Department of Biomedical Informatics at Columbia University, published online June 28 in the American Journal of Human Genetics. The study team performed most of the initial research at Columbia, before Wang recently joined CHOP.

The EHR-Phenolyzer evolved from an earlier computational tool, Phenolyzer, which Wang developed at the University of Southern California. Both draw on phenotypes — the observable physical manifestations of disease — to help identify gene mutations that give rise to patients’ conditions. However, Phenolyzer requires clinical experts to manually input phenotypic data in specific formats. The EHR-Phenolyzer automates that process, extracting clinically relevant information from the text of a narrative patient history such as that written by a genetic counselor or clinical geneticist. To draw out that clinical information, the researchers used natural language processing, a computer science approach long used in analyzing literature and historical texts, but only recently applied to medical genetics.

The EHR-Phenolyzer then translates extracted descriptors into standardized terminology, called Human Phenotype Ontology, and matches those terms to candidate causal genes, prioritized by how strongly the genes correlate with a patient’s phenotype. The overall goal, said Wang, is to expedite and improve genetic diagnoses by efficiently bridging patient data in health records to the constantly growing mass of genomic data.

The study team validated the EHR-Phenolyzer by assessing its performance in four independent cohorts of adults and children with suspected or diagnosed genetic diseases from two centers. In more than half of the individuals, the actual disease-causing mutations appeared in the EHR-Phenolyzer’s top 100 candidate genes, and in some cases, within the top 10.

The researchers plan to further refine the tool’s speed and effectiveness. Other possible goals include developing an individual-facing Phenolyzer that can analyze a patient’s self-reported information that may not be captured in the EHR, and extending the tool to handle languages other than English.

Working with the engineering team at DBHi, Wang and colleagues have begun a pilot project at CHOP to evaluate EHR-Phenolyzer using the hospital’s own EHR system. They have established an internal web server within CHOP to facilitate clinicians in using the tool via a web interface.

“This tool is especially relevant to a large pediatric hospital like ours, which sees many children with undiagnosed hereditary diseases,” said Wang. “Our goal is to reduce the duration, uncertainty and costs of the ‘diagnostic odyssey’ experienced by many affected children and their families, and to help guide them more quickly to the most appropriate clinical care."

In addition to his CHOP position, Wang also is an Associate Professor of Pathology and Laboratory Medicine in the Perelman School of Medicine at the University of Pennsylvania.

For more about this work, see this press release from Columbia University’s Data Science Institute.

Jung Hoon Son et al, “Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes,” American Journal of Human Genetics, published online June 28, 2018. https://doi.org/10.1016/j.ajhg.2018.05.010