Article Text

Comparison of qSOFA and Hospital Early Warning Scores for prognosis in suspected sepsis in emergency department patients: a systematic review
  1. Lisa Sabir1,
  2. Shammi Ramlakhan2,
  3. Steve Goodacre1
  1. 1 School of Health and Related Research (ScHARR), The University of Sheffield, Sheffield, UK
  2. 2 Emergency Department, Sheffield Children's NHS Foundation Trust, Sheffield, UK
  1. Correspondence to Dr Lisa Sabir, School of Health and Related Research (ScHARR), The University of Sheffield, Sheffield, S14DA, UK; l.sabir{at}sheffield.ac.uk

Abstract

Background Sepsis is a major cause of morbidity and mortality and many tools exist to facilitate early recognition. This review compares two tools: the quick Sequential Organ Failure Assessment (qSOFA) and Early Warning Scores (National/Modified Early Warning Scores (NEWS/MEWS)) for predicting intensive care unit (ICU) admission and mortality when applied in the emergency department.

Methods A literature search was conducted using Medline, CINAHL, Embase and Cochrane Library, handsearching of references and a grey literature search with no language or date restrictions. Two authors selected studies and quality assessment completed using QUADAS-2. Area under the receiver operating characteristic curve (AUROC), sensitivities and specificities were compared.

Results 13 studies were included, totalling 403 865 patients. All reported mortality and six reported ICU admission.

The ranges for AUROC estimates varied from little better than chance to good prediction of mortality (NEWS: 0.59–0.88; qSOFA: 0.57–0.79; MEWS 0.56–0.75), however, individual papers generally reported higher AUROC values for NEWS than qSOFA. NEWS values demonstrated a tendency towards better sensitivity for ICU admission (NEWS ≥5, 46%-91%; qSOFA ≥2, 12%–53%) and mortality (NEWS ≥5, 51%–97%; qSOFA ≥2, 14%–71%) but lower specificity (ICU: NEWS ≥5, 25%–91%; qSOFA ≥2, 67%–99%; mortality: NEWS ≥5, 22%–91%; qSOFA ≥2, 58%–99%).

Conclusion The wide range of AUROC estimates and high heterogeneity limit our conclusions. Allowing for this, the NEWS AUROC was consistently higher than qSOFA within individual papers. Both scores allow threshold setting, determined by the preferred compromise between sensitivity and specificity. At established thresholds NEWS tended to higher sensitivity while qSOFA tended to a higher specificity.

PROSPERO registration number CRD42019131414.

  • emergency department
  • infectious diseases
  • intensive care
  • clinical assessment
  • death/mortality

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key messages

What is already known on this subject

  • Recognition of sepsis is challenging; definitions have been redefined over the years, most recently the international consensus definition recommends the use of quick Sequential Organ Failure Assessment qSOFA) in the emergency department to rapidly identify those who are likely to have poor outcomes.

  • Several diagnostic and prognostic studies have compared qSOFA and SIRS, few have assessed Early Warning Scores (EWS) despite these being more routinely used clinically. If EWS could provide the same information, then they could be used earlier and allow standardisation and streamlining of effort.

What this study adds

  • This is the first systematic review that focuses on head-to-head comparisons of the most widely used scores—EWS and qSOFA, in the same cohort at recommended thresholds.

  • Highlights the heterogeneity of evidence—sepsis definitions, determination of scoring thresholds and relevance of outcomes of interest.

  • There is little to choose between these scores, however, at the current recommended thresholds National EWS has better sensitivity than qSOFA which has a better specificity.

Introduction

Sepsis, defined as ‘life-threatening organ dysfunction due to a dysregulated host response to infection’1 is a leading cause of death worldwide. A global estimate of annual incidence is 31.5 million, and an estimated 5.3 million deaths annually.2 In the UK, an estimated 52 000 patients die with sepsis annually. Consequently, many guidelines exist to enable early recognition and treatment to improve outcomes.3

However, recognition is challenging, as reflected in the redefinition of sepsis over the years. Previously, the systemic inflammatory response syndrome (SIRS) criteria have been used to identify sepsis (the Sepsis-1 definition),4 5 but replaced due to inadequate sensitivity and specificity.6 There have been two further International Consensus definitions (Sepsis-27 and Sepsis-3),8 with the latter recommending the use of the quick Sequential Organ Failure Assessment (qSOFA) in the emergency department (ED) to rapidly identify those who are more likely to have poor outcomes secondary to sepsis; a score of two or more predicts a three to fourteen-fold increase in rate of in-hospital mortality.9

Earlier management decisions such as intensive care unit (ICU) admission result in lower mortality10 11; a tool identifying those who may have poorer outcomes will facilitate these decisions.

In the acute setting, patients routinely get Early Warning Scores (EWS) calculated from physiological parameters. This is not condition-specific but is designed to indicate deterioration and trigger a response. The two most common EWS have been included in this review—the Modified EWS (MEWS) and the National EWS (NEWS). Online supplemental material has detailed information of the score components. In the UK, there has been a drive to make these scores consistent across all hospitals.12 NEW score of 5 or more has been validated as a way of detecting suspected sepsis patients at risk of deterioration and recommended by National Health Service England.13

Supplemental material

Several studies have compared qSOFA and SIRS.14 Few studies have assessed EWS, despite these being more routinely used clinically. If EWS are as accurate as other scoring tools, then they could be used earlier and allow standardisation and streamlining of effort. Additionally, looking specifically at studies that compare EWS and qSOFA would allow direct comparison of the tests applied to the same population rather than looking at one scoring system in isolation.

This systematic review aims to compare qSOFA with EWS (NEWS/NEWS2/MEWS) in predicting ICU admission and mortality in ED patients.

Objective

To compare the accuracy of qSOFA with EWS (NEWS/NEWS2/MEWS) at predicting ICU admission and/or mortality in adult ED patients with suspected sepsis.

Methods

This study was registered on the PROSPERO database (CRD42019131414).15

Data sources and search strategy

This review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines.16

After conducting scoping searches, the studies included were identified by searching the following electronic databases: Medline (OVID), CINAHL (EBSCO), Embase (OVID) and Cochrane Library.

Reference lists for eligible papers were hand searched to identify additional studies. Google Scholar was used to forward search to identify additional studies that have subsequently cited eligible papers. In addition, Open Grey and the Grey Literature Report were searched as well as ClinicalTrials.gov for ongoing trials. Authors of included papers needing additional data were contacted.

The search strategy was conducted using relevant subject headings for each database (such as Medical Subject Headings for MEDLINE) and free-text search terms (figure 1). There were no date restrictions, studies were included that were published up to the search date (January 2019 and rerun in March 2019). There were no language restrictions or methodological search filters to limit study design.

Figure 1

Keyword combinations used in literature searches. EWS, Early Warning Score; qSOFA, quick Sequential Organ Failure Assessment.

Inclusion and exclusion criteria

Figure 2 demonstrates the inclusion and exclusion criteria.

Figure 2

Demonstrates the inclusion and exclusion criteria. MEWS, Modified Early Warning Scores; NEWS, National Early Warning Score; qSOFA, quick Sequential Organ Failure Assessment; RCT, Randomised controlled trial.

Participants

The population included were adult patients with suspected sepsis. Paediatric and obstetric patients were excluded as specific scoring methods are used for these groups.

Given the diagnostic difficulties and nomenclature changes, the inclusion of patients with ‘suspected sepsis’ included the Sepsis 1 (SIRS response and infection), Sepsis 2 or Sepsis 3 definitions or the National Institute for Health and Care Excellence criteria definitions.1 4 8 It also included those coded as per International Classification of Disease codes, clinician identified or laboratory identified (such as cultures).

Index test

The index tests of interest were the qSOFA score and the NEWS/NEWS2/MEWS score. Studies that did not include both applied to the same cohort were excluded, as were those which only looked at one score in isolation. This allowed direct comparison of threshold effects in the same population.

Outcomes

The primary outcome was accuracy of scoring methods to predict ICU admission, with the secondary outcome being mortality. Sensitivity, specificity, positive and negative predictive values (PPV, NPV), positive and negative likelihood ratios (PLR, NLR), and the area under the receiver operating characteristic curve (AUROC) were used to compare the tools.

Setting

Studies were included where the scoring methods had been applied in an emergency setting where undifferentiated patients are initially assessed by a clinician. This allows assessment of applicability for use in an acute/emergency setting. ICU studies were excluded as these populations were likely to have been considered for higher level care based on either these scores or certain physiological parameters already.

Study selection

All identified articles were collated in a referencing software (www.zotero.org/) and duplicates removed. Titles and/or abstracts of the studies retrieved by the search strategy were screened independently by both reviewers (LS and SR) using a pre-specified screening selection tool. The full text of those that met criteria or were ambiguous were assessed by two authors independently (LS and SR) and discrepancies identified and resolved.

Data extraction

Data were extracted by LS using a standardised and piloted data extraction form. Extracted information included: study characteristics: (author, year, country, funding, study design and sample size); patient characteristics (age, sex); location; definitions of sepsis; index tests; time of score measurement; ICU admission and mortality including sensitivity and specificity data.

Quality appraisal

Quality was assessed using the QUADAS-2 tool for Quality Assessment of Diagnostic Accuracy Studies.17 It comprises four domains covering patient selection, index test, reference standard and flow and timing. All were assessed in terms of the ‘risk of bias’, and the first three domains were also assessed in terms of ‘concerns regarding applicability’. The tool signalling questions were used independently by LS and SR and discrepancies discussed. Each item was scored ‘low’, ‘high’ or ‘unclear’. No studies were excluded, but quality issues were considered. Studies scoring ‘low’ on all four domains were considered low risk of bias and applicability. Any that scored ‘high’ or ‘unclear’ were considered ‘at risk of bias and concerns regarding applicability’.

Statistical analysis

Statistical analysis was conducted using Microsoft Excel. Observing the results, a level of heterogeneity was present such as study population selection, definitions of sepsis, definitions of the outcomes of interest which precluded meaningful meta-analysis. Therefore, as per the Centre for Reviews and Dissemination guidelines for systematic reviews,18 a descriptive narrative synthesis has been presented rather than a meta-analysis.

For studies where the PPV, NPV, PLR or NLR were not given, these have been calculated producing 2×2 tables using the reported measures of accuracy, prevalence and sample sizes given.

When comparing AUROC, the commonly used definition of >0.9 excellent, 0.8–0.9 good, 0.7–0.8 fair, 0.6–0.7 poor is used.19

Results

Study identification

Figure 3 demonstrates the search strategy identified 1124 articles. initial screening based on the inclusion and exclusion criteria and strategy described, and then examination of full text articles resulted in 13 studies included in the analysis.20–31 One non-English language paper was identified and translated.27

Figure 3

Flow diagram of study selection and reasons for full-text exclusion. MEWS, Modified Early Warning Score; NEWS, National Early Warning Score; qSOFA, quick Sequential Organ Failure Assessment.

Study characteristics

All 13 studies were published between 2017 and 2019. Three were conducted in the UK,20 23 26 three in the Netherlands,25 31 32 two in the USA,21 22 two in Italy,28 29 one in Singapore,24 one in Spain27 and one in China.30 Eleven were single-centre studies,20–22 24 26–32 and two were multicentre.23 25 Study design included 10 observational studies,21–25 27–31 and 3 cohort studies.20 26 32 Ten of the studies reported the prognostic accuracy of previously developed scoring systems,20–23 25 28–32 and three compared a novel scoring method with existing tools.24 26 27 Online supplemental material summarises the study characteristics.

Participant characteristics, methodology, and outcomes

Table 1 demonstrates the participant characteristics. The 13 studies totalled 403 865 patients. The proportion of women varied between 34.35% and 53%. Four studies included the ED but also included wards, medical assessment units or direct specialty referrals.20 21 23 26 Where given, the ED results were separated for comparison. One study looked at ED-HDU where the HDU was based in and managed by Emergency clinicians.28 Two studies looked at all patients presenting to the ED and applied the scoring criteria to all of these patients to look at the diagnostic ability to predict sepsis.22 31 Patients were included in whom there was a suspicion of sepsis. This was either purely clinical,20 clinical but initially flagged with a scoring system such as NEWS ≥323 or other abnormal physiological parameters,27 30 by triage category,24 25 blood cultures and intravenous antibiotics,21 22 32 or based on final diagnosis or coding.22 26 28 29 31

Table 1

Participant characteristics

Seven studies compared NEWS and qSOFA,20 22–24 26 27 32 two compared both NEWS and MEWS with qSOFA21 25 and four compared MEWS and qSOFA.28–31

Scores were either calculated at time of arrival,20 22 24 retrospectively from records21 25 28–32 or from a vital signs database,26 calculated from the point sepsis was suspected,23 or finally, both initial and worst measurements used.24

Four studies did not investigate admission to ICU,22 27 30 32 one study reported scores on admission to ICU but not the proportion admitted.28 Two studies combined either ICU admission with medium care unit (MCU) admissions25 or in-hospital mortality when giving the proportion.26 Four studies reported the proportion of patients admitted to ICU, two of these had no further analysis on this outcome,23 31 and two had further analysis but as a composite outcome with in-hospital mortality21 or in-hospital mortality and intubation.24 Only two studies investigated ICU admission as a separate outcome and provided AUROC values29 or both AUROC and sensitivity and specificity data.20 The prevalence of admission to ICU varied from 3%20 to 26%.21 The study reporting the highest proportion only included suspected sepsis cases based on intravenous antibiotics and blood cultures and were, therefore, likely have a higher severity and hence higher admission rate to ICU.

The definition of mortality varied: eight studies reported in-hospital mortality,20–22 25–27 29 31 with one also reporting sepsis-related in-hospital mortality,22 others described 30-day in-hospital mortaltiy,24 30 day all-cause mortality23 32 or 28-day mortality.28 30 Mortality varied between 3.6%31 and 31%.28 The paper with the lowest mortality included all medical patients, not just those with suspected sepsis. Conversely, the paper reporting the highest mortality, selected ED-HDU patients, described by the authors as a ‘sub-ICU’ therefore already excluding lower risk cohort.

Quality appraisal

Three studies were considered to have a low risk of bias,20 25 26 and the remaining were considered at risk of bias or concerns regarding applicability (table 2 ). The most common category of concern was the ‘flow and timing’ domain which relates to the timings of the index test and reference standard and length of follow-up.

Table 2

Quality assessment (QUADAS-2)

Eight authors were contacted21–24 26–30 for further statistical data including confidence intervals of the extracted data. Five authors21–24 26 replied with data within the allocated time frame and these have been included.

ICU admission

Table 3 details the summary statistics for the four studies that reported ICU admission: one reported this outcome alone20 and the remaining as a composite with mortality,21 ICU admission or mortality,26 or composite with mortality and intubation.24 For ICU admission alone, using the recommended cut-offs (NEWS≥5, qSOFA ≥2), Goulden et al reported a higher sensitivity for NEWS than qSOFA which had a better specificity. They did not demonstrate statistical significance for AUROC data, but NEWS (0.64) was higher than qSOFA (0.59) (table 4 ).

Table 3

Study results: predicting ICU admission,20 composite of ICU admission and mortality,21 ICU admission or mortality,26 composite of ICU admission, intubation and in-hospital death24

Table 4

AUROC data for ICU admission and mortality for qSOFA, news and MEWS

For the combined outcomes, NEWS results were fair (0.70,24 0.72,21 0.75,25 0.7926) compared with qSOFA (respectively 0.63,24 0.62,21 0.72,25 0.6826).

In the studies reporting combined outcomes, the general trend is similar with NEWS demonstrating a higher sensitivity and lower specificity for recommended thresholds than qSOFA. NEWS ≥5 demonstrated a higher sensitivity than MEWS ≥5, which in turn appears to have a higher sensitivity and lower specificity compared with qSOFA ≥2, however, there were only two studies looking at MEWS for ICU admission.

Mortality

Seven studies reported AUROC data for in-hospital mortality for NEWS and qSOFA. The range of estimates demonstrate overlap but the individual studies mostly appear to report a higher AUROC for NEWS (0.65,20 0.67,25 0.73,27 0.79,26 0.80,210.8822) than qSOFA (0.62,200.68,250.67,270.68,260.71,210.7922) AUROC for 30 days in-hospital mortality demonstrated a similar pattern (NEWS: 0.70,24 0.7832; qSOFA: 0.65,24 0.7032). One study reported data suggesting poor predictors of mortality for both NEWS (0.59) and qSOFA (0.57).23 The study that reported sepsis-related mortality demonstrated an AUROC for NEWS of 0.95 and 0.87 for qSOFA.22 However, correlating this data with the specificity, it is high for both qSOFA ≥2 (98.7%) and NEWS ≥5 (90.2%) which is likely due to the inclusion of patients with other diagnoses as well as suspected infection as the authors wanted to show its use as a triage tool. A similar higher specificity is seen in two other papers that did not only look at suspected sepsis but also those with other diagnoses26 31 ( table 5 ).

Table 5

Predicting mortality (routine practice cut-offs (qSOFA ≥2 and NEWS ≥5))

Ten studies reported data on qSOFA ≥2, which generally demonstrates high specificity but low sensitivity.20–26 31 32 Lowering the threshold to qSOFA ≥1 improves the sensitivity but with the expected reduction of specificity.20–22 25–27 32 The NEWS and MEWS scores were more variable but generally a NEWS ≥5 had a higher sensitivity than qSOFA ≥2 (50.5%–97.3% and 13.6%–71.1% respectively), but lower specificity, with some authors suggesting that NEWS ≥7 is the optimal cut off.21

Discussion

To our knowledge, there are no previous systematic reviews looking at qSOFA and EWS at predicting ICU admission or mortality in the emergency setting for suspected sepsis. The recent publications included in this review are likely due to the recent implementation of EWS, in particular the modified versions (MEWS/NEWS). None of the studies included NEWS2 which is likely to change given its adoption nationally in the UK in March 2019.33

The results demonstrate AUROC estimates were variable ranging from little better than chance to good prediction for mortality (NEWS: 0.59–0.88; qSOFA: 0.57–0.79; MEWS 0.56–0.75). However, individual papers mostly reported higher AUROC values for NEWS than qSOFA when compared directly at predicting mortality and ICU admission. This could be because NEWS includes more variables than qSOFA, and also has more points (20 vs 3).

At the commonly used thresholds NEWS trended to a better sensitivity than qSOFA for determining ICU admission and mortality, whereas qSOFA trended to a better specificity. Consequently, at the recommended thresholds for NEWS may be less likely to miss serious sepsis, which is necessary in an emergency setting, however, it may result in more overtriage than qSOFA. These results correlate with previous criticisms of qSOFA with high specificity for early risk assessment but a low sensitivity34 and question whether qSOFA ≥1 should be used instead.35 These thresholds are set for a particular purpose: NEWS ≥5 to identify those at risk of acute deterioration and trigger a response; qSOFA ≥2 to recognise those at risk of poorer outcomes.

The results have potential for a threshold effect; changing the NEWS threshold to a higher value would increase its specificity, at the expense of sensitivity whereas in ED triage a balance has to be struck so as not to miss those that are seriously unwell.

More research is necessary for accurate conclusions to change clinical practice. Given that the population is living longer with chronic conditions, specifically including studies that report on possible differing physiological changes by age and reporting of co-morbidities as these factors are also likely to affect in-hospital mortality and admission to ICU.

Limitations

Due to the marked heterogeneity in patient selection, outcomes and settings, the data could not be pooled for meta-analysis. Including studies reporting one scoring method would increase the number of articles included and hence likely produce more results that could be grouped for meta-analysis. However, as by definition, the groups studied would be different, this would lead to significant bias, and heterogeneity.

Regarding ICU admission, only two studies reported this alone, the remaining were composite outcomes without complete datasets for comparison. Furthermore, patients who die are often older than the overall study population, and also mostly not admitted to ICU,20 36 therefore this is likely to affect a prediction model for ICU admission for reasons such as ceilings of care or do not resuscitate decisions. Most of the studies did not look at factors such as comorbidities and age, but where this has been investigated the conclusions drawn are different.25 de Groot et al,25 demonstrated that for older patients the diagnostic performance for each score with in-hospital mortality in the higher scoring categories (eg, NEWS ≥7) is not as extreme as that for younger patients relating to different physiological responses in the elderly to sepsis.

Finally, studies have calculated the scores at different times; for it to be effective in the emergency setting, the scores need to be taken as early as possible (initial triage) to help with decision making and predicting those that are likely to have a poorer outcome. The immediate decision often relates to the appropriateness of ICU-level care.

Conclusions

Overall, there is a wide range of results and high degree of heterogeneity between the studies which limits the ability to draw conclusions.

There is little to choose between these scores, however, at the current recommended thresholds NEWS has better sensitivity than qSOFA which has a better specificity. Consequently, NEWS may be less likely to miss serious sepsis but may result in more overtriage than qSOFA.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information

Ethics statements

Patient consent for publication

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Handling editor Kirsty Challen

  • Twitter @shammi_ram

  • Contributors All authors made substantial contribution to the conception and design (LS, SR and SG), search strategy, study selection, data extraction (LS and SR), analysis and interpretation (LS, SR and SG). LS drafted the article and all other authors revised it critically.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests SG is chief investigator for the PHEWS study (Pre-Hospital Early Warning Scores for Sepsis), funded by the National Institute for Health Research Health Technology Assessment Programme (Reference 17/136/10). This paper was not undertaken as part of the PHEWS study.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.