Article Text

Performance of early warning and risk stratification scores versus clinical judgement in the acute setting: a systematic review
  1. Lars Ingmar Veldhuis1,2,
  2. Milan L Ridderikhof1,
  3. Lyfke Bergsma3,
  4. Faridi Van Etten-Jamaludin4,
  5. Prabath WB Nanayakkara5,
  6. Markus Hollmann2
  1. 1 Emergency Medicine, Amsterdam UMC Locatie AMC, Amsterdam, The Netherlands
  2. 2 Anaesthesiology, Amsterdam UMC Locatie AMC, Amsterdam, The Netherlands
  3. 3 Internal Medicine, Amsterdam UMC Locatie VUmc, Amsterdam, The Netherlands
  4. 4 Clinical Library, Amsterdam UMC Locatie AMC, Amsterdam, The Netherlands
  5. 5 Section Acute Medicine, Department of Internal Medicine, Amsterdam Universitair Medische Centra, Amsterdam, The Netherlands
  1. Correspondence to Dr Milan L Ridderikhof, Emergency Medicine, Amsterdam UMC - Locatie AMC, Amsterdam, 1105 AZ, Netherlands; m.l.ridderikhof{at}amsterdamumc.nl

Abstract

Objective Risk stratification is increasingly based on Early Warning Score (EWS)-based models, instead of clinical judgement. However, it is unknown how risk-stratification models and EWS perform as compared with the clinical judgement of treating acute healthcare providers. Therefore, we performed a systematic review of all available literature evaluating clinical judgement of healthcare providers to the use of risk-stratification models in predicting patients’ clinical outcome.

Methods Studies comparing clinical judgement and risk-stratification models in predicting outcomes in adult patients presenting at the ED were eligible for inclusion. Outcomes included the need for intensive care unit (ICU) admission; severe adverse events; clinical deterioration and mortality. Risk of bias among the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool.

Results Six studies (6419 participants) were included of which 4 studies were judged to be at high risk of bias. Only descriptive analysis was performed as a meta-analysis was not possible due to few included studies and high clinical heterogeneity. The performance of clinical judgement and risk-stratification models were both moderate in predicting mortality, deterioration and need for ICU admission with area under the curves between 0.70 and 0.89. The performance of clinical judgement did not significantly differ from risk-stratification models in predicting mortality (n=2 studies) or deterioration (n=1 study). However, clinical judgement of healthcare providers was significantly better in predicting the need for ICU admission (n=2) and severe adverse events (n=1 study) as compared with risk-stratification models.

Conclusion Based on limited existing data, clinical judgement has greater accuracy in predicting the need for ICU admission and the occurrence of severe adverse events compared with risk-stratification models in ED patients. However, performance is similar in predicting mortality and deterioration.

PROSPERO registration number CRD42020218893.

  • emergency department
  • clinical management
  • risk management
  • care systems
  • clinical assessment

Data availability statement

Data are available on reasonable request. No data are available.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Risk-stratification scores are being used to improve early recognition of deteriorating patients.

  • It is unknown how risk-stratification scores perform compared with clinical judgement of acute healthcare providers.

WHAT THIS STUDY ADDS

  • In this systematic review of six studies, clinical judgement has greater accuracy in predicting the need for intensive care unit admission and severe adverse events compared with risk-stratification scores for patients in the ED.

HOW THIS STUDY MIGHT AFFECT RESEARCH

  • More studies are needed to investigate the potential benefits of risk stratification tools in the acute care chain. of risk stratification tools in the acute care chain.

Introduction

Early recognition of clinically deteriorating patients in the ED is essential in preventing severe adverse events, (unplanned) intensive care unit (ICU) admissions and death.1–4 Often, abnormal vital signs precede death and serious adverse events.5 6 Inappropriate or delayed response to abnormal vital signs and clinical deterioration increases the risk of severe adverse events, morbidity and mortality.7–9 Acute healthcare providers are trained to identify and triage patients based on their severity of illness.

Attempts to achieve earlier identification of the clinically deteriorating patient led to the introduction of Early Warning Scores (EWS) risk-stratification models.10 11 EWS are used to recognise the early signs of clinical deterioration by assessing various physiological variables (eg, BP, HR, oxygen saturation, RR, level of consciousness and temperature). Predictions regarding mortality and deterioration are increasingly based on these risk scores.12–14

However, there are also suggestions in the literature that formal scoring systems are not superior to clinical judgement. Strict adherence to National Early Warning Score (NEWS) by general practitioners was shown to have the potential to increase (unneeded) hospital admissions.15 Classification of disease severity based on developed decision rules is inferior compared with clinical performance.16–20

These data raise the question whether implementing risk-stratification models such as EWS can improve early recognition of critical illness or a lead to a more accurate triage urgency compared with the knowledge, skills and experience (ie, clinical judgement) of experienced practitioners.

To achieve early identification of critically ill patients in acute care settings, the risk-stratification models and EWS are increasingly being implemented at the ED. However, little is known about the actual benefit of these tools compared with clinical judgement for identifying the deteriorating patient in the ED.

The aim of this study was to review available data comparing prediction of patient outcomes by clinical judgement of acute healthcare providers with the use of risk-stratification models or EWS. Our primary objective was to investigate the added value of risk-stratification models and EWS compared with clinical judgement in identifying patients who need ICU admission. We hypothesise that clinical judgement of ED personnel is superior to risk-stratification scores in predicting the need for ICU admission.

Methods

This review was performed according to the methods that were described in detail in the systematic review protocol and is reported in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. The review protocol was registered a priori in the PROSPERO database (registration number: CRD42020218893).

Study identification

The Medline, EMBASE and Cochrane Central Register of Controlled Trials databases were searched for relevant articles from inception to November 2020. The search strategies were developed in conjunction with a clinical librarian (FVE-J). Complete search strategies can be found in online supplemental appendix 1. The WHO International Clinical Trials Registry Platform (www.who.int/ictrp/en/) and US National Institutes of Health (https://clinicaltrials.gov/) were searched to identify any unpublished ongoing trials. Reference lists of studies that met the eligibility criteria were hand-searched to identify any potentially missed studies.

Supplemental material

Study selection

Studies were eligible for inclusion when they compared risk-stratification models or EWS with clinical judgement for adult patients (aged 18 years and above) in an acute setting, that is, prehospital and at the ED. Studies reporting the area under receiver operating characteristics or sensitivity with specificity were included. Studies reporting decisions based on protocols and studies focusing on patients already admitted to a ward were excluded. Two authors (LIV and LB) screened all titles and abstracts, and those without relevance were excluded. Full text of all remaining studies was obtained, read and assessed qualitatively using the eligibility criteria. Disagreements were resolved by discussion or a third assessor (MLR) if agreement could not be reached.

Data extraction and risk of bias assessment

Two reviewers (LIV and LB) independently extracted data regarding study characteristics, results and the risk of bias using a data collection form. Extracted study data included type of risk-stratification score investigated, years of experience of medical personnel, study population, demographic data, number of participants and outcome (ie, ICU admission, deterioration, severe adverse events and mortality). For studies reporting more than one set of values of sensitivity and specificity, for example for different thresholds, each set of sensitivity and specificity was extracted. We used the predetermined (optimal) threshold of each risk-stratification score for data analysis according to the existing literature. The Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool was used to assess the risk of bias and the concerns regarding applicability for each of the included study.

Patient and public involvement

Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Data synthesis and analysis

Studies with the same outcome were grouped, that is, mortality, ICU admission, adverse events and deterioration. Meta-analysis was planned in case sufficient data would be extracted from each study and no significant clinical heterogeneity was present. Sensitivity and specificity of the included risk-stratification score to predict the study outcome were based on predetermined cut-off points and compared with sensitivity and specificity of clinical performance.

Data were not pooled in case of considerable heterogeneity that could not be explained by the diversity of methodological or clinical features among the trials. However, in case it was inappropriate or not possible to pool the data, we will present analyses for illustrative purpose. Synthesis was performed using RevMan (2011).

Results

Study selection

After removing duplicates, 1080 titles and abstracts were screened for eligibility. Eventually, the full text of 31 studies was obtained and assessed by two reviewers. This resulted in the inclusion of six studies (6419 participants) (see figure 1).

Figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart.

Study characteristics

The included studies addressed a variety of patients; all ED patients (n=1)21; sepsis-only patients (n=3)22–24; trauma-only patients (n=1)25 and boarding patients staying in the ED and waiting for in-patient beds.26 One study reported clinical judgement of prehospital EMS nurses while the other studies included clinical judgement of ED physicians (n=3), ED nurses (n=1) or both ED physicians and nurses (n=1). Only the study evaluating judgement of the ED nurses reported years of experience of the examined staff, which varied between 6 and 9 years.26 The risk-stratification models used in the studies were the Predisposition Infection Response and Organ dysfunction (PIRO) score (n=3); the Modified Early Warning Score (MEWS) (n=2); the Mortality in ED Sepsis (MEDS) score (n=2); the quick Sepsis Organ Failure Assessment (qSOFA) score (n=1) and the Denver ED Trauma Organ Failure Score (n=1). Due to the heterogeneity among the studies, meta-analysis could not be performed.

Risk of bias analysis

Four studies (67%) were judged to be at high risk of bias on one domain of the QUADAS-2 tool (see figure 2 and figure 3). The most common risk of bias was due to retrospective interpretation of the risk-stratification score with knowledge of the clinical judgement outcome.

Figure 2

Risk of bias within studies.

Figure 3

Risk of bias and applicability concerns summary: review authors’ judgements about each domain for each included study.

ICU admission

Two studies reported prediction of ICU admission in suspected sepsis patients.22 24 de Groot et al reported a comparable AUC for clinical judgement, PIRO and MEDS, 0.81 (95% CI 0.73 to 0.90), 0.74 (95% CI 0.65 to 0.83) and 0.70 (95% CI 0.60 to 0.80), respectively with p=0.19. However, sensitivity and specificity were significantly higher for clinical judgement compared with the optimal threshold of PIRO and MEDS, all p<0.001 (table 1). Quinten et al reported good AUCs for clinical judgement in predicting ICU admission for physicians as well as for nurses; 0.866 (95% CI 0.793 to 0.938) and 0.793 (95% CI 0.700 to 0.886), respectively. However, despite not reporting a p value, the authors stated that it was not significantly better as compared with PIRO or qSOFA, with AUC of 0.752 (95% CI 0.628 to 0.876) and 0.811 (95% CI 0.718 to 0.903), respectively. Regarding performance for the optimal threshold, sensitivity for clinical judgement was better for physicians and nurses (80% and 80%) compared with PIRO and qSOFA (57.1% and 57.1%), with a comparable specificity (77.8% and 75.0% vs 89.0% and 87.2%). However, 95% CI and p values were not reported. See table 2.

Table 1

Study characteristics

Table 2

Study outcome

Severe adverse events (ICU admission, cardiac arrests and mortality)

Fullerton et al compared MEWS with clinical judgement for the prediction of ICU admission in combination with medical emergency team attendance, cardiac arrest and death for all ED patients, see table 2.21 Sensitivity was comparable with clinical judgement of the EMS nurses, 56.6% (95% CI 45.1 to 67.9) vs 61.8% (95% CI 51.0 to 72.8). However, specificity was significantly better for clinical judgement (94.1% (95% CI 93.2 to 94.9) vs 88.5% (95% CI 87.4 to 89.7)).

Mortality

Two studies reported prediction of mortality within 28 days after initial ED presentation, see table 2. de Groot et al reported that the PIRO or MEDS have similar AUC compared with clinical judgement in predicting mortality in patients with severe sepsis; 0.68 (95% CI 0.61 to 0.74), 0.68 (95% CI 0.61 to 0.74) and 0.69 (95% CI 0.60 to 0.78), respectively.23 In addition to severe sepsis, de Groot et al, reported the AUC for patients suspected of any form of infection, with a similar performance between prediction models and clinical judgement with an AUC of 0.84 (95% CI 0.73 to 0.96) for clinical judgement, 0.83 (95% CI 0.75 to 0.91) for PIRO and 0.78 (95% CI 0.70 to 0.86) for MEDS.

Quinten et al reported a similar AUC for clinical judgement of physicians and nurses in predicting mortality compared with PIRO and qSOFA 0.866 (95% CI 0.793 to 0.938) and 0.793 (95% CI 0.700 to 0.886) vs 0.752 (95% CI 0.628 to 0.876) and 0.811 (95% CI 0.718 to 0.903), however, the p value was not reported.24

Deterioration

One study reported the prediction of deterioration of patients waiting for in-hospital beds in the ED, see table 2.26 The MEWS was incorporated into current nursing practice for patient monitoring in selected days per week. In this study, nurses decided to notify their senior doctors based on MEWS or clinical gestalt. The senior doctor judged afterwards whether the consultation was needed or unnecessary. The MEWS group had twice the number of consultations (n=31) compared with the clinical judgement group (n=14), although this was not a significant difference. However, in the MEWS group, seven (23%) consultations were activated not based on MEWS, but on clinical judgement (eg, abnormal lab results).

Multiorgan failure

One study evaluated patients with traumatic injuries and the development of multiorgan failure within 7 days of initial ED presentation, see table 2 .25 The Denver ED Trauma Organ Failure Score and clinical judgement had an AUC of 0.89 (95% CI 0.86 to 0.91) and 0.78 (95% CI 0.73 to 0.83), respectively. Despite not reporting whether the difference was significant, the 95% CI suggested a relevant difference between the groups.

Discussion

The aim of this study was to investigate the added value of risk-stratification models or EWS compared with clinical judgement in acute patients. In total, six relevant studies comparing performance of the models with clinical judgement were identified. According to these studies, clinical judgement of ED physicians and nursing healthcare staff was significantly better in predicting the need for ICU admission and severe adverse events compared with risk-stratification models in ED patients and in the prehospital setting. However, accuracy in predicting mortality, deterioration and multiorgan failure seems to be similar between risk-stratification models and clinical judgement of acute healthcare personnel. These findings, however, are limited by the small number of studies. As all included studies were highly heterogeneous in study population, sample sizes and the use of different risk-stratification models, a meta-analysis could not be performed. Study quality was variable, with two studies rated as low risk of bias on all domains and four studies rated as high risk of bias as the risk-stratification score was retrospectively calculated.

Previous research reported several limitations of the use of EWS.27 To explore the performance of EWS, a previous systematic review from 2014 concluded that due to the heterogeneity, clear conclusions about its benefits cannot be drawn.28 The current review confirms the heterogeneity of risk-stratification models and EWS reducing the applicability and generalisability in general use. In addition, the actual benefits of risk-stratification models or EWS systems in the ED seems limited, probably due to the high level of training of ED healthcare staff who treat (potentially) acutely ill patients daily and the constant presence of physicians and nurses. Two of the four cases studied in this paper suggest that clinical judgement is superior in predicting patient outcome compared with risk-stratification models or EWS in the acute healthcare setting. Therefore, solely using risk-stratification models in the acute care setting seems questionable. This statement is in concordance with prior literature stating that prediction of patient outcome by decision rules is inferior compared with clinical gestalt.16–19

The included studies in this review compare the individual performance of risk-stratification models and EWS with clinical judgement. In practice, risk-stratification models are likely to be used as a tool to support or complement clinical judgement. Fullerton et al showed that combining MEWS with clinical judgement in the ED resulted in the best AUC, supporting this thought.

Implications for practice and research

This review concludes that risk-stratification models and clinical judgement have a similar performance in predicting mortality and deterioration in ED patients and patients in the prehospital setting. However, performance of clinical judgement in predicting the need for ICU admission and severe adverse events is better compared with risk-stratification models. Clinical judgement remains an important and useful tool to classify the severity of illness. Clinical judgement is partly based on (abnormal) vital variables. However, it remains unknown what aspects of clinical judgement determines the superiority compared with risk-stratification models, which should be further investigated.

Strengths and limitations

The main strength of this review is the extensive literature search done by a clinical librarian in multiple databases. Also, grey literature and reference lists of included studies were searched and selected by two independent authors. In addition, quality and risk of bias assessment was performed. Limitations of this study are the small number of included studies in combination with the high clinical heterogeneity among these studies precluding a meta-analysis. The majority of the included studies (67%) were judged to be at high risk of bias on one domain of the QUADAS-2 tool. In addition, none of the studies were randomised controlled trials. In addition, while all included studies were prospective cohort studies, a limited effort was made to reduce bias. For example, So et al performed a study using MEWS and clinical judgement on alternate days to predict patient outcomes. However, the healthcare professionals were not split into using MEWS or their clinical judgement, whereby subconsciously or consciously their clinical judgement may have been based on MEWS on non-MEWS days.

In addition, So et al, Vogel et al and both studies performed by de Groot et al used initial measured parameters to retrospectively calculate the sum score in their models. However, clinical judgement was obtained after several hours rather than directly at the ED presentation. During these hours at the ED, it is likely that the patients received any form of treatment. This response to ED treatment is likely to be incorporated into the clinical judgement, highly influencing the accuracy of their judgement.

In summary, this systematic review found limited data of only moderate quality to investigate the added value of risk-stratification models and EWS compared with clinical judgement of healthcare staff treating patients in the prehospital setting and in the ED. Clinical judgement has greater accuracy in predicting the need for ICU admission and severe adverse events compared with risk-stratification models or EWS for patients in the ED. However, performance is similar in predicting mortality and deterioration.

Data availability statement

Data are available on reasonable request. No data are available.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Handling editor Kirsty Challen

  • Contributors LIV designed, analysed, interpreted the data and draft the article and is guarantor for overall content. FVE-J performed the literature search. LB assisted in study selection and data extraction. All other authors revised critically for important intellectual content and helped with final version to be submitted.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.