Definition of a Novel Imaging Quality Measure for the Evaluation of Emergency Department Patients with Suspected Pulmonary Embolism: Use of AI NLP to Validate and Automate It

CT pulmonary angiography (CTPA) utilization rates for patients with suspected pulmonary embolism (PE) in the Emergency Department (ED) have increased steadily with associated radiation exposure, costs and overdiagnosis. A quality measure is needed to precisely assess efficiency of CTPA utilization, normalized to numbers of patients presenting with suspected PE and based on patient signs and symptoms. This study used Artificial Intelligence approaches such as ontology-driven natural language processing (NLP) to develop, automate, and validate SPE (“Suspected Pulmonary Embolism [PE]”), a measure determining CTPA utilization in ED patients with suspected PE. This retrospective study was conducted 4/1/2013-3/31/2014 in a Level-1 ED. A NLP engine processed “Chief Complaint” sections of ED documentation, identifying patients with PE-suggestive symptoms based on four Concept Unique Identifiers (CUIs: shortness of breath, chest pain, pleuritic chest pain, anterior pleuritic chest pain). SPE was defined as proportion of ED visits for patients with potential PE undergoing CTPA. Manual reviews determined specificity, sensitivity and negative predictive value (NPV). Among 5,768 ED visits with 1+SPE CUI, and 795 CTPAs performed, SPE=13.8% (795/5,768). AI and NLP identified patients with relevant CUIs with specificity=0.94 [95%CI (0.89-0.96)]; sensitivity=0.73 [95%CI (0.45-0.92)]; NPV=0.98. Using NLP on ED documentation can identify patients with suspected PE to computate a more clinically relevant CTPA measure. This measure might then be used in an audit-and-feedback process to increase the appropriateness of imaging of patients with suspected PE in the ED.

Evidence-based recommendations exist to guide clinicians in the diagnostic workup of patients with suspected PE; the combination of risk-stratification using a validated tool (e.g., the Wells criteria [4,7]), supplemented by D-dimer measurement [8] has been used for over 15 years [9] and adopted by a number of professional societies [10,11]. However current measures of CTPA utilization or adherence to Wells criteria do not accurately capture providers' adherence to evidence, as patients who are appropriately not imaged are not well represented in existing measures. For example, appropriateness is often determined by using a denominator of patients who underwent CTPA and does not include patients in whom imaging was not ordered (who may have been excluded from imaging using the Pulmonary Embolism Rule-Out Criteria8 or clinical gestalt). Similarly, using overall CTPA use per ED visit does not limit the measure denominator to only patients with suspected PE, so comparisons between EDs with different prevalences of disease are not meaningful.
Thus, there is a need for a quality measure that precisely assesses the efficiency of CTPA utilization normalized to the number of patients with suspected PE who present to the ED, based on patients' signs and symptoms at presentation. The purpose of our study was to develop, automate, and validate a new tool -using unstructured data from clinical notes -to define a cohort of patients with suspected PE, which can then be used to develop a quality measure, Suspected Pulmonary Embolism (SPE).

Study Setting and Human Subjects Approval
This HIPAA-compliant-retrospective cohort study was conducted between April 1, 2013 and March 31, 2014 in the ED of an urban Level-I adult trauma center with ~60,000 visits annually. It was approved by the Institutional Review Board (Protocol Number: 2013P000267). Subsequent to the study period, the hospital underwent a change in the electronic medical record system, making the data capture necessary for this study more difficult.

Data Sources
Data sources included the ED information system, the radiology information system (RIS), and the computerized physician order entry (CPOE) system. For each ED visit, we obtained the text of the ED attending notes well as the text of the "Chief Complaint" To construct the denominator for the imaging measure, we sought to quantify the cohort of ED patients with signs and symptoms at presentation suggestive of PE, we have used an Artificial Intelligence approaches such as ontologies-based, natural language processing (NLP) tool [12]. After consulting a multi-disciplinary group of clinical, informatics, and imaging experts, we based our algorithm on four of the most common signs and symptoms of PE as represented by Concept Unique Identifiers (CUIs) extracted from the ED note "Chief Complaint" field: shortness of breath (C0013404), chest pain (C0008031), pleuritic pain (C0008033, C0423632) and anterior pleuritic pain (C3532941).

NLP Engine and Customization
The AI NLP platform cTakes version 3.0.1 [13] [including YTEX [14,15] was customized with RadLex [16] and the latest releases of the SNOMED-CT vocabulary files using the NCI-supported Knowledge Representation languages RDF and process definitions from MetamorhoSys' sub-setting utility [17]. The extraction of the CUIs was done using a SQL query with multiple joins for the unique batch name of the job, resulting in a table, each line of which contained the CUI and the ID of the input "Chief Complaint" snippet of text. We also included polarity in the extraction query; a polarity of -1 corresponded to a negation of the named entity [13].

NLP Validation Process
To assess the accuracy of the AI NLP-based PE cohort discovery process, we conducted a manual validation, in which the results of a human-expert classification were compared to those extracted by the NLP algorithm. A physician research assistant was instructed and trained by an attending emergency physician to perform manual chart review classification while blinded to the results of the AI NLP-based classification. A validation sample size of 245 (5% of) cases was reviewed, and 10% of these were overread by the attending emergency physician.

Outcome Measures
As

Statistical Analyses
All analyses and visualizations were carried out in the R statistical programming environment [18], version 3.0.2. We used Pearson r and t-test statistics to quantify correlation and similarity of distributions between monthly series of the specific measures.
P-values<0.05 were considered statistically significant. The agreement between PE cohort discovery using the AI NLP algorithm and manual chart review was compared using sensitivity and specific-     Figure 1 displays the results of a histogram analysis of the "Chief Complaint" field content across the mapping CTPA <-> ED visit. Notably, the large third bar corresponds to an empty "Chief Complaint" field. In addition, the shape of the distribution has a long tail corresponding to symptoms non-specific for PE, i.e., "fever" or "weakness".

Discussion
We have introduced and computed a new measure of utilization of CTPA imaging in patients with suspected PE in the ED. In contrast with existing imaging utilization metrics, SPE is normalized to the number of patients in whom PE is suspected, a patient cohort whose identification is based on patients' signs and symptoms at presentation. We have automated the calculation of this metric by casting it as an Artificial Intelligence NLP task on unstructured clinical narratives and structured EHR documentation, and then defining the cohort of PE-suspect patients using 4 common CUIs.
Calculation of the new measure, SPE, resulted in 13.8% of patients presenting with symptoms of PE who obtained CTPA.
Current imaging quality measures fail to capture the appropriate patient populations. Appropriateness-based measures require resource-intensive calculation of pretest probability and d-dimer measurement, but still exclude patients in whom these data are not available, or who were excluded prior to the determination of these values (e.g., by using PERC.) Conversely, global utilization measures compute the number of CTPAs performed compared to overall ED visit volume, a method that cannot take into account local prevalence of PE.
Our validation of the algorithm for detecting patients with suspected PE had a sensitivity of 73% when compared to manual chart review. This is not surprising, given the other illnesses that can present with a "Chief Complaint" of chest pain or shortness of breath. However, the specificity of 94% and the NPV of 98% are reassuring, in that we likely excluded the vast majority of patients in whom PE would not have been suspected by the treating physician.
In order to determine whether the four CUIs we selected to model SPE patients were an adequate definition for the cohort, we reviewed the most common indications recorded in the "Chief Complaint" field (

Limitations
Our study has a number of limitations. First and foremost, our algorithm is dependent on the quality of the data in the electronic health record, notably the presence and completeness of the "Chief Complaint" field in the ED record. For straightforward data gathering, we chose to base the algorithm on a single, pre-parsed text field from the ED notes ("Chief Complaint"), even though additional signs or symptoms might have been present in the free text of the "History of Present Illness" section. In addition, it was conducted in a single academic healthcare center, potentially limiting generalizability. Finally, the data are somewhat dated. However, this was unavoidable due to a change in the electronic medical record system at our hospital.

Implications
Our findings have the potential to improve the quality of care delivery by more accurately measuring the appropriateness of CTPA use for ED patients with suspected PE. Current measures typically only include patients who have undergone CTPA, missing completely those patients who are not imaged by physicians based on clinical criteria. Thus, physicians are unable to accurately determine whether they are appropriately evaluating patients with PE when compared with their peers, limiting the utility of audit-and-feedback reporting meant to improve the appropriateness of imaging.
It would be ideal to verify our findings across different institutions in both community and academic healthcare delivery settings to determine generalizability prior to widespread adoption of this new imaging metric. However, given the potential utility of this model in this imaging modality and indication, performing computation of imaging utilization metrics using appropriate patient cohorts using advanced but existing NLP public tools and ontologies is likely possible for other imaging scenarios as well.
For example, head CT imaging use in ED patients with mild traumatic brain injury (MTBI) has been shown be disproportionally variable [20]. At the same time, mature guidelines for use of imaging in MTBI exist (e.g., the Canadian CT Head Rule) [21]. Determining the rate of head imaging for patients with suspected MTBI, using appropriate CUIs, would be much more appropriate than the broad utilization metrics currently being considered [22]. Similarly, magnetic resonance imaging use in adult primary care patients with low back pain [23] -for which guidelines [24] and point-ofcare clinical decision support implementations [23] both existmight be an appropriate target as well.

Conclusions
Use of AI NLP of physician notes in the ED can help identify patients with suspected PE via flagging specific CUIs in the Chief Complaint field. This should allow for computation of a more clinically-relevant measure of imaging use efficiency of CTPA.

Declarations Ethics approval and consent to participate
The study was approved by the Institutional Review Board of the Partners HealthCare and Brigham and Women's Hospital (Pro-