Article Text

Download PDFPDF

Diagnostic errors related to acute abdominal pain in the emergency department
  1. Laura Medford-Davis1,
  2. Elizabeth Park2,
  3. Gil Shlamovitz3,
  4. James Suliburk4,
  5. Ashley ND Meyer5,
  6. Hardeep Singh5
  1. 1Department of Emergency Medicine, Robert Wood Johnson Foundation Clinical Scholars, University of Pennsylvania, Philadelphia, Pennsylvania, USA
  2. 2Section of Emergency Medicine, Baylor College of Medicine and Harris Health System, Ben Taub General Hospital Emergency Center, Houston, Texas, USA
  3. 3Department of Emergency Medicine, University of Southern California Keck School of Medicine, Los Angeles, California, USA
  4. 4Michael E DeBakey Department of Surgery, Baylor College of Medicine and Harris Health System, Houston, Texas, USA
  5. 5Houston Veterans Affairs Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center and Baylor College of Medicine, Houston, Texas, USA
  1. Correspondence to Dr Laura Medford-Davis, Robert Wood Johnson Foundation Clinical Scholars, Department of Emergency Medicine, University of Pennsylvania, 1310 Blockley Hall 423 Guardian Drive, Philadelphia, PA 19104-6021, USA; medford.davis{at}


Objective Diagnostic errors in the emergency department (ED) are harmful and costly. We reviewed a selected high-risk cohort of patients presenting to the ED with abdominal pain to evaluate for possible diagnostic errors and associated process breakdowns.

Design We conducted a retrospective chart review of ED patients >18 years at an urban academic hospital. A computerised ‘trigger’ algorithm identified patients possibly at high risk for diagnostic errors to facilitate selective record reviews. The trigger determined patients to be at high risk because they: (1) presented to the ED with abdominal pain, and were discharged home and (2) had a return ED visit within 10 days that led to a hospitalisation. Diagnostic errors were defined as missed opportunities to make a correct or timely diagnosis based on the evidence available during the first ED visit, regardless of patient harm, and included errors that involved both ED and non-ED providers. Errors were determined by two independent record reviewers followed by team consensus in cases of disagreement.

Results Diagnostic errors occurred in 35 of 100 high-risk cases. Over two-thirds had breakdowns involving the patient–provider encounter (most commonly history-taking or ordering additional tests) and/or follow-up and tracking of diagnostic information (most commonly follow-up of abnormal test results). The most frequently missed diagnoses were gallbladder pathology (n=10) and urinary infections (n=5).

Conclusions Diagnostic process breakdowns in ED patients with abdominal pain most commonly involved history-taking, ordering insufficient tests in the patient–provider encounter and problems with follow-up of abnormal test results.

  • errors
  • quality assurance
  • abdomen- non trauma
  • safety
  • diagnosis

Statistics from

Key messages

What is already known on this subject?

  • Diagnostic errors lead to substantial patient safety concerns in the Emergency Department (ED) but methods to study and learn from them are underdeveloped.

  • Abdominal pain presents a diagnostic challenge in the ED and is often associated with unscheduled return visits.

What might this study add?

  • An electronic “trigger” algorithm based on unscheduled return visits followed by rigorous record review methodology identified cases of abdominal pain at high-risk for diagnostic error.

  • Over two-thirds of diagnostic errors involved breakdowns in the patient-provider encounter (most commonly history-taking or ordering additional tests) and/or follow up and tracking of diagnostic information (most commonly follow-up of abnormal test results).

  • A triggered review methodology identified opportunities for process improvement and might be useful for other EDs considering measurement and reduction of diagnostic errors.


Diagnostic errors are estimated to affect 12 million adults in the USA annually in the outpatient setting.1 Data from malpractice claims suggest that of all medical errors, diagnostic errors are the most frequent and expensive, and they contribute to the most morbidity and mortality.2 ,3 While their frequency in the emergency department (ED) is unknown,4 an ED-specific malpractice study suggests that they are a significant issue in the emergency setting. Almost half (47%) of ED claims and 62% of payouts are due to diagnostic errors, and these errors lead to at an average of US$295 000 more per payout as compared with other types of errors.5

The ED is an inherently risky environment where physicians see many sick and unfamiliar patients while being interrupted every 3–6 min.6–8 An estimated 129.8 million annual ED visits combined with a conservative 5% estimate of diagnostic error rate in outpatient care suggests millions of incorrect diagnoses could be made in EDs each year.9 ,10 Though many diagnostic errors never lead to harm, the rate of this type of error and its consequences may be underestimated and understudied.1 While unscheduled return visits to the ED within 72 h are commonly used as a trigger to find these errors,11 ,12 evidence suggests that the revisits associated with poor quality and error may occur up to 9 days after the initial visit.13 Reviews of unscheduled ED return visits estimate that 12%–25% of patients returning had an incorrect initial diagnosis.14 ,15

While several reviews of return visits have found abdominal pain to be a major factor associated with unscheduled returns (n=10),15 prior studies of repeat visits to the ED and adverse events have only focused on chest pain (n=13), psychiatric complaints (n=10) and chronic lung disease (n=8). Abdominal pain is the most common chief complaint16 and also a particular area of risk for the ED physician because several abdominal conditions can present with uncommon or unusual presentations.

The primary objective of this study was to determine the types and origins of diagnostic errors in a high-risk cohort of patients who presented with abdominal pain to the ED. A secondary objective was to test methods of error identification and review in the ED that others could use for future investigation within their own institutions.


Study design and setting

We conducted a retrospective chart review study of patients seen in the ED of a large tertiary hospital affiliated with the Baylor College of Medicine in the USA. Providers include over 25 full-time ED physicians (ie, emergency medicine residency trained and working in this ED about 28 clinical hours per week), 20 full-time mid-level providers (physician assistants and nurse practitioners who independently treat and discharge patients with complaints triaged as less severe, but seek help from ED physicians as needed), 30 part-time providers (ie, emergency medicine residency-trained physicians and mid-level providers who work <28 h/week), 42 emergency medicine residents and a rotating cadre of off-service interns from other specialities. All surgical specialities are available in the hospital to consult on patients in the ED when requested by the ED providers. The hospital system has a comprehensive integrated electronic health record (EHR) and a large network of primary and speciality clinics. The patient population is ethnically and racially diverse, with one-third falling below 200% of the federal poverty level and nearly three-quarters being uninsured. Typically, uninsured patients are asked to pay anywhere from several hundred to several thousand dollars in cash up front for physician visits, surgeries and hospitalisations accessed outside of the ED. Uninsured patients have access to emergent care through the ED without payment, but receive a bill after treatment.

Study protocol

This study was approved by the local institutional review board. We developed an electronic trigger algorithm to identify patients who visited the ED between May 2011 and May 2013 (index ED visit) and who met two criteria that we believed would make them at high risk for diagnostic error of an abdominal complaint: (1) had ‘abdominal pain’ in the provider's ‘History of Present Illness’ or in the diagnosis codes, or had a lipase laboratory test or an abdominal image (kidney, ureter and bladder (KUB)) X-ray, ultrasound or CT) performed and (2) were subsequently discharged home from the ED and then had a return ED visit that led to hospitalisation between 1 and 10 days after the initial visit (figures 1 and 2). We included lipase because nurses order this test as a triage protocol for abdominal pain and while nurses are also authorised to order basic metabolic panels, complete blood counts, liver function tests, urinalysis and pregnancy tests, lipase is the most specific. In cases where the same patient made two or more pairs of visits that triggered the algorithm, only the earliest visit pair was included. The electronic trigger algorithm identified 621 unique patients. This list was sorted using Microsoft Excel random number generator, and records of these patients were briefly reviewed in order. The review excluded patients who did not actually have a chief complaint of abdominal pain on chart review, or had traumatic, long-standing (>6 weeks) or dialysis-related presentations of abdominal pain. We, thus, reviewed 260 charts to find the first 100 eligible patients, a sample size similar to prior diagnostic error work,6 ,17 as well as feasible and practical given the exploratory aims of this study. We did not plan to make any error frequency estimations based on this sample; rather, we aimed to gather an adequate number of errors to gain insights for quality improvement purposes.

Figure 1

Patient selection criteria flowsheet. ED, emergency department.

Figure 2

Number of days between emergency department (ED) visits: error versus non-error cases.

Two emergency physician ‘primary’ reviewers independently reviewed patient charts to determine the presence or absence of error using a standardised data abstraction form modified from previous studies.17 Diagnostic error studies traditionally have had very low reviewer agreements, so disagreements between primary reviewers about the presence/absence of error were independently reviewed by a surgeon and an additional emergency physician.17–20 If these two secondary reviewers disagreed, the case was discussed in a meeting that included the primary and secondary reviewers and two diagnostic error researchers to reach a team consensus. This serial consensus methodology overcame some of the limitations of reliably measuring diagnostic errors.21

Similar to prior published work, we defined diagnostic errors as missed opportunities to make a correct or timely diagnosis based on the available evidence, regardless of patient harm.19 ,22 Operationally, these were judged to have occurred if adequate data to suggest the final, correct diagnosis were already present at the index ED visit or if documented abnormal findings at the index visit should have prompted additional evaluation that would have revealed the correct, ultimate diagnosis. Thus, errors were determined to occur only when missed opportunities to make an earlier diagnosis were present at the index ED visit.19 Diagnostic errors included any missed opportunity during the index ED visit, whether missed by the ED provider or by a consulting provider, patient or other staff member such as a radiologist. Reviewers were trained on how to identify diagnostic error cases based on prior studies before beginning the review process, and practised the review process on 10 cases.

Data collected from the EHR included patient and visit characteristics, presenting symptoms, past medical history as documented in the providers’ notes, abnormal test results, differential diagnoses and final diagnoses as documented in the note or encounter, discharge instructions and follow-up plans from the initial visit. Information collected from the second visit included the evolution of the patient's presentation, including details on symptom progression and new test results, and final diagnoses as documented in the hospital discharge summary.


In addition to determining the presence or absence of diagnostic error, the primary reviewers collected details about the error, if discovered. To identify the types of diagnostic process breakdowns underlying these errors, we used an existing five-level classification developed by Singh et al19 and categorised process breakdowns into the following categories: patient–provider encounter, performance and/or interpretation of diagnostic tests, follow-up and tracking of diagnostic information, referral-related processes and patient-specific processes. Patient–provider encounter issues included problems with history, physical examination, failure to review previous documentation and problems ordering diagnostic test for further work-up in the ED (such as failure to order an imaging study while in the ED). We also used an eight-point ‘Human Error Consequence Scale’ previously validated by Singh et al19 in diagnostic error research to collect data about each error's potential for harm, with one indicating no harm/no inconvenience and eight indicating the potential for immediate or inevitable death.


Descriptive statistics were used to describe characteristics of the diagnostic error cases, including the clinical conditions involved, as well as associated process breakdowns and the potential for harm. Characteristics were compared between error and non-error cases using t tests for continuous variables and χ2 or Fisher's exact tests, where appropriate, for categorical variables. Then, a multivariable logistic regression was performed using characteristics that differed significantly between the error and non-error cases using univariate tests. All data were analysed using IBM SPSS Statistics V.22.


Of 100 cases reviewed in detail, 35 were ultimately determined to contain diagnostic errors. The most frequently missed diagnoses were acute gallbladder pathology (n=10), urinary system infections (n=5), diverticulitis (n=2), small bowel obstruction (n=2), appendicitis (n=2), cancer (n=2) and ectopic pregnancy (n=2). Selected examples of errors where both primary reviewers were in agreement on the presence of error are presented in table 1.

Table 1

Examples of errors with reviewer agreement and the process breakdowns involved (all patients have been referred to as male, and ages have been masked).

Primary reviewer inter-rater agreement was 67.7% on the presence or absence of error; secondary reviewers reviewed 33 cases where the primary reviewers disagreed and achieved a 54.8% agreement for presence/absence of error in these cases. The remaining 18 cases of disagreement were discussed by the team, of which, 8 were determined to have errors by consensus.

All high-risk cases had similar patient and initial ED visit characteristics, except error cases had shorter initial ED visit lengths of stay from arrival to ED departure (p=0.002), were more likely to be seen by a mid-level provider than a physician (p=0.013), made fewer visits to the ED in the 3 months preceding the initial visit (p=0.002) and were less likely to report a history of substance abuse (p=0.03) (table 2). When laboratory test results were abnormal, error cases were less likely than non-error cases to have the abnormal laboratory results documented as reviewed in the provider's note (25.7% (n=9) vs 45.3% (n=29), p=0.004), and more likely to have the abnormal results inadequately addressed at the initial visit (42.9% (n=15) vs 0.0% (n=0), p<0.001). Error cases were more likely to have a differential diagnosis that did not include the final diagnosis (37.1% (n=13) vs 25.0% (n=16), p=0.031). Using a multivariable logistic regression, error cases still differed from non-error cases in having shorter initial ED visit lengths of stay (OR 0.91, CI 0.84 to 1.00, p=0.045) and higher likelihood of being seen by a mid-level provider (OR 2.68, CI 1.01 to 7.13, p=0.048).

Table 2

Demographics of error versus non-error cases

Of the 35 error cases, most (74.3%) had multiple types of process breakdowns based on the 5-level classification; nine (25.7%) error cases had three breakdowns and 17 (48.6%) had two breakdowns (table 3). Two-thirds had problems with the patient–provider encounter (68.6%, n=24), most frequently related to failure to order sufficient diagnostic tests for work-up (48.6%, n=17), or problems collecting the patient history (40.0%, n=14). Additionally, almost three-fourths had problems with follow-up and tracking of diagnostic information (74.3%, n=26), most frequently related to follow-up of abnormal diagnostic test results (65.7%, n=23). Nine (25.7%) cases had issues related to diagnostic test performance/interpretation. Additional breakdowns included problems with the consultation process (11.4%, n=4) and patient-related issues (14.3%, n=5) such as failure to mention key symptoms. The potential severity of injury was categorised as considerable to severe in 57% of cases, with the most frequent risk being considerable harm (figure 3).

Table 3

Errors in five dimensions

Figure 3

Potential severity of injury associated with 35 error cases.


We used an electronic algorithm to identify a cohort of patients with abdominal pain who we considered at high risk for diagnostic errors. We then used a rigorous record review methodology to identify errors and processes for quality improvement. The majority of breakdowns involved two diagnostic processes: the patient–provider encounter, where breakdowns often involved failures in gathering history or ordering sufficient diagnostic tests for further work-up; and failures in follow-up and tracking of diagnostic information, where breakdowns frequently involved failure to review or act on abnormal test results. Similar areas of process breakdown have been documented in ambulatory care19 ,23 as well as in malpractice claims in the ED setting.5 ,6 Our study documents the presence of the high frequency of these process breakdowns in the ED in cases of abdominal pain-related diagnostic error.

We used a robust methodology to identify and determine diagnostic errors that could also be useful to others pursuing error measurement for diagnostic quality improvement. First, we leveraged the EHR to focus on a selective high-risk cohort of patients. Few studies have used electronic trigger methods for error identification,20 and none in ED settings. These methods could be useful to inform future research as well as local quality improvement efforts related to diagnostic errors where measurement methods are fairly limited.24 Second, the serial consensus method between ED physicians, surgeons and diagnostic error experts was a relatively novel method for case evaluation and added rigor to diagnostic error determination, which typically has had very low agreement in previously published studies.6 ,18 ,19 Third, we used an existing classification of process breakdowns and applied this to understand the origins of these errors.

Our study underscores the importance of addressing measurement-related challenges to advance the understanding and improvement of diagnostic errors.24 In the absence of existing gold standards, different physician specialists often perceive quality of care very differently in the same case, which we also found while reviewing cases. At times, even when a consensus was reached between the secondary emergency physician reviewer and the surgeon, each gave a different reason why they believed the error had occurred. Measurement of diagnostic error is additionally challenging because each physician deals with uncertainty very differently, even within the same speciality. Often factors other than training such as culture, norms and personal preferences might sway their judgements. Further work in this area needs to develop more objective criteria for diagnostic errors, and focus on how to identify missed opportunities that could be examined for improvement and prevention.22

Failure to follow up abnormal laboratory was a common breakdown. A frequent example was the failure to order abdominal imaging for patients with abnormal liver function tests. While emergency physicians typically do not follow up abnormal tests themselves beyond the ED encounter, they are expected to either order appropriate further investigations during the initial ED visit or to arrange outpatient follow-up for patients to have the abnormality rechecked. Problems in gathering patient history were also common breakdowns. Examples included failing to elicit a history of similar episodes (such as in a patient with known porphyria), not using a language interpreter and eliciting an inadequate or incomplete history, especially one that contradicted the nursing notes, the latter being accurate as discovered by information gathered during the second visit. Notably, missing the documented information in nurse’s notes was one of the reasons attributed to the misdiagnosis of the 2014 Ebola case in a Dallas, Texas, ED.25

Although little actual harm was documented in these 35 error cases, more than half were judged to have the potential for significant harm. Two errors were judged to be life threatening, the first a missed ectopic pregnancy that was ruptured on return and the second a missed spontaneous bacterial peritonitis in a patient with cirrhosis who returned with altered mental status. High rates of harm in malpractice claims6 signify the potential negative consequences of diagnostic error, making its reduction a high-priority area for quality improvement.

Diagnostic errors are often thought to stem from bias or from overreliance on type 1, or automatic, heuristic thinking rather than deliberate type 2 reasoning.26 Interventions could include deliberate consideration of a broad differential diagnosis prior to final disposition, learned cognitive forcing strategies or use of a checklist, all of which force the mind to slow down prior to making a final decision.26 ,27 Cognitive error concepts and mitigation strategies should be integrated into medical and residency education both in the classroom and through simulation. Medical culture should encourage seeking assistance from fellow ED colleagues, consultants or diagnostic aid tools rather than praising independence.26 ,28–30 Other systems-based solutions such as chief complaint-triggered clinical decision support, EHR tools, lowering the ED patients per hour/throughput expectations and involving patients in the diagnostic error identification and reduction process are potential solutions that warrant further evaluation.31

Our study has several limitations. It has a small sample size and represents an exploratory analysis of the origins of certain types of diagnostic errors in emergency medicine, rather than a prevalence or hypothesis testing study. While it includes patient factors in the origins of error, it does not fully examine other contributions to error that may have occurred in the pre-ED setting.32 However, it builds methodologically on similar studies done in other settings.5 ,6 ,18 ,19 ,23 ,33 The inter-rater agreement was relatively low, but it was similar to prior studies about the presence of error.6 ,18 ,19

Retrospective studies such as ours are also subject to hindsight bias and limited by the data recorded in the patient chart at the time of evaluation. For example, while we found a significant association between diagnostic errors and the failure to document a broad differential diagnosis or abnormal test results as reviewed, providers may have considered the correct diagnosis and reviewed laboratory results, but not documented these actions in their notes. Thus, we are unsure if these failures represent documentation problems alone or premature cognitive closure without consideration of a sufficiently broad differential and failure to notice or recognise an abnormal test result.

Furthermore, the types of diagnoses that were missed may not be generalisable to other patient populations with abdominal pain. We also might not have captured patients who returned to a different ED, although due to a hospital-specific financing scheme, many patients primarily obtain care within this hospital system. Patients with atypical presentations of abdominal pathology such as back or hip pain would not have been identified by the trigger algorithm unless a provider suspected abdominal pathology and ordered lipase or abdominal imaging. Nevertheless, the analysis is comprehensive, and has identified common high-risk processes that warrant attention in order to reduce diagnostic errors in the ED.

In conclusion, process breakdowns leading to diagnostic errors in patients presenting with abdominal pain to the ED most commonly involved incomplete history-taking, ordering insufficient tests in the patient–provider encounter and problems with follow-up of abnormal test results. Future investigations into diagnostic error in the ED setting should analyse and address the contributory factors underlying these process breakdowns and test interventions to reduce them.


We thank Traber Giardina Davis for project assistance.



  • Twitter Follow Laura Medford-Davis at @MedfordDavis and Hardeep Singh @HardeepSinghMD.

  • Contributors LM-D and HS conceived the study. EP and LM-D were the primary case reviewers. GS and JS were the secondary case reviewers. ANDM performed the statistical analysis. LMD drafted the manuscript, and all authors contributed substantially to its revision. LMD takes responsibility for the paper as a whole.

  • Funding HS is supported by the VA Health Services Research and Development Service (CRE 12-033; Presidential Early Career Award for Scientists and Engineers USA 14-274), the VA National Center for Patient Safety and the Agency for Health Care Research and Quality (R01HS022087). This work was supported in part by the Houston VA HSR&D Center for Innovations in Quality, Effectiveness and Safety (CIN 13-413).

  • Disclaimer The views expressed in this article are those of the authors, and do not necessarily represent the views of the Department of Veterans Affairs or any other funding agency.

  • Competing interests None declared.

  • Ethics approval This study was approved by the local Institutional Review Boards of Baylor College of Medicine and Harris Health System.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles