Objective: To determine how well general decision support systems perform given the data collected in an emergency department (ED).
Methods: A convenience sample of 25 patients was selected from those patients having a diagnostic question on presentation to the ED. All interactions with the patients were audiotaped and abstracted into a structured data form. All other data such as written notes, laboratory, and EKG results were also abstracted. All data were entered into two general diagnostic decision support programs (Quick Medical Reference (QMR Version 3.82, Knowledge Base 10–07–1998 Copyright University of Pittsburgh and The Hearst Corporation) and Iliad (Version 4.5 Copyright 1996 Applied Medical Informatics)). The diagnoses generated by the computer programs were compared with the final diagnoses of the ED attending.
Results: The final ED diagnosis was found in the differential diagnosis generated by Iliad and QMR 72% and 52% of the time respectively. The final ED diagnosis was found in the top 10 diagnoses 51% and 44% of the time and in the top five diagnoses 36% and 32% of the time for each program respectively. This approximates to the performance of these programs in other clinical settings.
Conclusions: Diagnostic decision support software has the same success in finding the “correct” diagnosis in the ED as in other clinical settings where more extensive clinical data are available. The accuracy is not sufficiently high to permit the use of these programs as an arbiter in any individual case. However, they may be useful, prompting additional investigation in particularly difficult cases.
- clinical diagnosis
- decision support systems
- medical informatics
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Decision support algorithms and software have been developed in an attempt to improve decision making in the emergency department (ED). However, the usefulness of these tools is limited by their narrow scope, as they are usually designed to improve diagnostic accuracy in the analysis of specific problems such as chest pain or abdominal pain.1–5 The scope of emergency medicine is wide. General decision support systems such as Quick Medical Reference (QMR) and Iliad can help the ED physician consider remote diagnostic possibilities in a time and resource efficient manner. Previous studies of these programs have focused on difficult, inpatient, or paper cases developed from a medical record after extensive clinical and laboratory data are available.6–9 No study has assessed how these programs perform using the limited amount of data that are collected during an ED visit. The purpose of this study is to evaluate how well Quick Medical Reference (QMR Version 3.82, Knowledge Base 10–07–1998 Copyright University of Pittsburgh and The Hearst Corporation) and Iliad (Version 4.5 Copyright 1996 Applied Medical Informatics) parallel physician decision making in common ED situations.
QMR and Iliad (named for the originator, Homer Warner at The University of Utah) are “expert systems”. Expert systems are designed to emulate the solutions to problems that one might expect from a human expert. In the cases of QMR and Iliad, medical symptoms, signs, and laboratory results are entered, compared to known entities, and a differential diagnosis is generated. Both QMR and Iliad allow for an adjustment to reflect the prevalence of a disease in the population. Iliad uses Bayesian analysis while QMR uses non-Bayesian algorithms.6 For example, headache and fever may indicate meningitis. But because the prevalence of influenza in the population is higher, influenza would be listed as the more probable diagnosis.
One limitation of these expert systems is that they only encompass a finite set of illnesses, symptoms, and signs. As more entities are added to their databases, it can be expected that the accuracy and relevance of the differential diagnosis generated to improve. Additionally, as they are designed by humans, they may reflect the biases of the authors and put more weight on one finding (for example, fever) than on another (for example, headache). To minimise the risk of bias, the symptom complex representative of a disease is generally arrived at by consensus (personal communication, Homer Warner).
One scenario in which these programs are useful is when the investigator is presented with an unusual constellation of symptoms or a presentation about which they have limited knowledge. For example, the differential diagnosis of a monoarticular arthritis includes not only common illnesses such as gout, pseudogout, and gonoccoccal arthritis but also more obscure entities that may not readily come to mind such as Brucillosis. An expert system can help by expanding one’s differential diagnosis and suggesting other avenues of pursuit including what questions to ask as well as the most cost effective method of evaluating the patient.
A convenience sample of 25 patients was selected from all patients seen in our ED, a tertiary care academic medical centre. The study period was from 15 July to 15 August 2001. Exclusion criteria included age under 18, residence in a correctional facility, inability to give consent, psychiatric complaints, and inability to communicate in English. Patients were also excluded if there was no diagnostic question (for example, trauma such as lacerations, fractures, corneal abrasions, etc).
After consent was obtained, clinical information about the patient visit was prospectively collected. The basic information was recorded by ED examiners (generally a resident or supervised medical student) on a structured data form (T-System, Copyright 2001, www.tsystem.com). To capture any other data that entered into the decision making process, one of the researchers (DVS) accompanied all examiners during all contacts with each patient. All of these contacts were audiotaped and relevant clinical data from tape transcriptions were abstracted and entered into a structured data form. All other entries in the medical record including dictated notes, laboratory results, etc, were reviewed after the visit and any additional data gleaned were added to the case abstract. Only data available in the ED were included in the case abstracts. Thus, the ED physician EKG and radiograph interpretations were used in the case abstract. One of the researchers (MG) was available to assist in interpreting terminology or other data and reviewed all of the cases. The final ED discharge diagnoses from the medical records are listed in table 1.
All of the clinical information including history, physical findings, laboratory findings, radiographic findings, EKG findings, etc, was entered into QMR and Iliad. The differential diagnosis generated by the computer programs was compared with the final diagnosis of the ED physicians, all of whom were (and are) academic faculty in emergency medicine. The performance of the computer programs was determined by comparing the number of cases where the attending staffs’ diagnosis was listed on the programs’ differential list. As the purpose of this study was to determine how well QMR and Iliad perform using only the information collected in the ED, any information obtained while the patient was an inpatient or at subsequent outpatient visits was not considered. Likewise, the “final” diagnosis made in the hospital or at subsequent visits was not considered. This design was chosen because the purpose of this study was not to determine the accuracy of ED diagnoses but rather so see how QMR and Iliad performed compared with experienced ED physicians. The Human Subjects committee at our institution approved this protocol.
The results are summarised in table 2. Each case took from 20 to 40 minutes to input.
When defining success as having the ED physicians’ diagnosis within the top five generated by QMR and Iliad, the programs were successful in about one third of the cases. Overall, the ED physicians’ diagnosis appeared somewhere in differential diagnosis with about the same frequency that has been found in other studies (around 50%–70%).6 Thus, while these programs may be helpful in a general sense, applying them as an arbitrator in any particular case may be problematic. The considerable length of the differential produced by the programs, often greater than 30 diagnoses, and the length of time it takes to input a case may hinder their usefulness. The strength of these programs is their ability to expand the differential diagnosis and make suggestions about further testing and evaluation.10,11 This can be done by entering a few key findings. Further study of decision support software for this purpose in the ED is warranted.
Several limitations in the design of these programs became clear during this study. The type of information that can be entered into the programs is limited. Iliad and QMR do not take into account the drugs that a patient may be taking. The ability to input the duration of signs and symptoms is also limited, especially with QMR, and the programs are unable to account for the sequence of symptom development. Other findings are simply not in the programs’ vocabulary despite trying multiple synonyms and therefore cannot be entered as part of case. Further development of these programs may mitigate these problems.
The main limitation of this study is the reliance on the ED physicians’ diagnosis as the “criterion standard”. The final diagnosis after hospitalisation or subsequent outpatient visits may or may not differ from the ED diagnosis. However, this is after additional information has been collected and additional time has elapsed. We concentrated on how these programs perform given the limited information collected in the ED. All of the physicians involved in the study were academic emergency physicians at a major teaching university. So, even though the final, post-hospitalisation, diagnosis may vary from that made in the ED, this study reflects the best available diagnosis in the ED.
In conclusion, diagnostic decision support software has the same success in finding the “correct” diagnosis in the ED as has been found in other clinical settings. The accuracy is not sufficiently high to permit the use of these programs as an arbiter in any individual case. However, they can be used to broaden the differential diagnosis.
Conflicts of interest: none declared.