Article Text

Download PDFPDF

Determination of the best early warning scores to predict clinical outcomes of patients in the emergency department
  1. William Spencer1,
  2. Jesse Smith2,
  3. Patrick Date3,
  4. Erik de Tonnerre4,
  5. David McDonald Taylor5,6
  1. 1 Alfred Health, Melbourne, Victoria, Australia
  2. 2 Central Gippsland Health, Sale, Sydney, Victoria, Australia
  3. 3 Austin Hospital, Heidelberg, Victoria, Australia
  4. 4 Northern Sydney Local Health District, NSW Health, NSW, Australia
  5. 5 Emergency Department, Austin Health, Heidelberg, Victoria, Australia
  6. 6 Department of Medicine, University of Melbourne, Melbourne, Victoria, Australia
  1. Correspondence to Professor David McDonald Taylor, Emergency Department, Austin Health, Heidelberg, VIC 3084, Australia; david.taylor{at}


Objective Early warning scores (EWS) are used to predict patient outcomes. We aimed to determine which of 13 EWS, based largely on emergency department (ED) vital sign data, best predict important clinical outcomes.

Method We undertook a prospective cohort study in a metropolitan, tertiary-referral ED in Melbourne, Australia (February–April 2018). Patient demographics, vital signs and management data were collected while the patients were in the ED and EWS were calculated using each EWS criteria. Outcome data were extracted from the medical record (2-day, 7-day and 28-day inhospital mortality, clinical deterioration within 2 days, intensive care unit (ICU) admission within 2 days, admission to hospital). Area under the receiver operator characteristic (AUROC; 95% CIs) curves were used to evaluate the predictive ability of each EWS for each outcome.

Results Of 1730 patients enrolled, 690 patients were admitted to the study hospital. Most EWS were good or excellent predictors of 2-day mortality. When considering the point estimates, the VitalPac EWS was the most strongly predictive (AUROC: 0.96; 95% CI: 0.92 to 0.99). However, when considering the 95% CIs, there was no significant difference between the highest performing EWS. The predictive ability for 7-day and 28-day mortality was generally less. No EWS was a good predictor for clinical deterioration (AUROC range: 0.54–0.70), ICU admission (range: 0.51–0.72) or admission to hospital (range: 0.51–0.68).

Conclusion Several EWS have excellent predictive ability for 2-day mortality and have the potential to risk stratify patients in ED. No EWS adequately predicted clinical deterioration, admission to either ICU or the hospital.

  • emergency department
  • assessment
  • clinical management

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What this paper adds

What is already known on this subject

  • Early warning scores (EWS), based largely on patient vital sign and observation data, were developed to identify patients at high risk of undesirable outcomes.

  • With the advent of electronic medical records and automatic vital sign measurement, EWS could be continuously calculated and high-risk patients identified for review.

  • However, the most suitable EWS for use in the emergency department (ED) setting is not known.

What this study adds

  • We compared 13 EWS for their ability to predict patients in ED at high risk of mortality, clinical deterioration, admission to both hospital and the intensive care unit.

  • Most EWS were highly predictive of mortality, especially within 2 days, but all performed poorly for the other outcomes.


Altered physiology, as reflected in abnormal vital signs and other observations, often precedes patient deterioration and death.1–4 Accordingly, a range of early warning scores (EWS) have been designed for use in the prediction of patients at risk of these undesirable outcomes. Most EWS employ routinely collected vital sign data, although these may be supplemented with other clinical observation data and test results. Some EWS have been designed for specific presentation types (eg, Trauma Injury Severity Score (TRISS) injury severity score5) or for specific clinical conditions (eg, quick Sepsis Related Organ Failure Assessment  (qSOFA) score for sepsis6). Others have been designed for use in broad patient populations in a range of settings (eg, the acute medical assessment unit,7–9 the emergency department (ED)10–12).

The identification of patients at risk of undesirable outcomes is of particular importance in the ED. Patients are often undifferentiated with confusing clinical presentations. This often precludes reasonable certainty regarding diagnoses, management pathways and prognosis. Clinicians may not be able to accurately predict patients most at risk as clinical judgement often has a subjective component.13 However, EWS may assist ED clinicians to risk stratify their patients. This will inform the most appropriate management decisions including medication regimens, referral timing and place of disposition. This, in turn, may reduce the incidence of undesirable outcomes and improve patient outcomes.14–16

Although a number of EWS have been designed for use in undifferentiated ED and acute medical unit settings, the most suitable has not been determined. Desirable features will include simplicity, ease of use and good predictive ability for the outcomes of interest. This study aimed to compare the performance of 13 EWS in predicting important clinical outcomes within the same patient population in ED. The findings will inform systems development for the integration of EWS into electronic medical records (eMR) capable of analysing routinely collected observation data and flagging patients at risk.


Study setting and population

We undertook a prospective cohort study in the ED of a metropolitan, tertiary-referral teaching hospital in Melbourne, Australia, between February and April 2018 inclusive. The department has an annual mixed (adult and paediatric) patient census of approximately 85 000. The Human Research Ethics Committee of Austin Health, Heidelberg, Australia approved the study.

Patients were included if they were admitted to an ED cubicle or resuscitation bay. Patients were excluded if they were aged under 18 years or if a full set of vital signs was not recorded at the same time, at least once during their ED stay. Patients transferred to another hospital after the above criteria were met were included in data analysis of prediction of admission but were excluded from analysis of other outcomes.

A convenience sample of 2000 consecutive patients was identified using the ED eMR. These were consecutive patients identified during periods when the principal investigator (WS) was present in the ED (mostly on weekdays, 09:00–17:00 hours).

Study protocol

We compared the ability of 13 EWS to predict important outcomes of undifferentiated patients in ED: the Rapid Acute Physiology Score (RAPS),17 Modified EWS (MEWS),7 Modified EWS with Glascow Coma Scale  (GCS) (MEWS GCS),10 Rapid Emergency Medicine Score (REMS),11 18 Goodacre Score,19 Worthing Physiological Score (WPS),20 Groarke Score,21 VitalPac EWS (ViEWS),9 Abbreviated VitalPac EWS (AbViEWS),22 Glasgow Coma Scale-Age-Systolic Blood Pressure Score,23 Vital Sign Score (VSS),24 National EWS (NEWS)8 and Vital Sign Group (VSG) Scores.12

The EWS were selected following a literature review and using inclusion and exclusion criteria outlined in online supplementary appendix 1. In brief, the EWS needed to have been used in the ED or similar setting, and to use mainly vital sign data to generate a numerical score that predicted clinical outcomes (including death). EWS designed for specific clinical conditions (eg, sepsis) or patient subgroups (eg, psychiatry) were excluded. The patient data items used to generate each EWS are described in online supplementary appendix 2. They included age, vital signs, level of consciousness, oxygen saturation, the need for supplemental oxygen and airway interventions, and seizure activity. Some of the selected EWS were designed with the explicit intention of use in the ED population. Most, however, were derived and validated outside of the ED and, therefore, have limited research supporting their use in this setting (table 1).

Supplemental material

Supplemental material

Table 1

EWS derivation populations and outcomes predicted

In accordance with usual practice, the ED nursing staff recorded all patient observation data. These were then extracted by the principal investigator from the eMR while the patient was in the ED. The need for airway intervention or the presence of seizures (both needed to generate the VSS score) were also extracted from the eMR and/or confirmed with the attending nursing staff at the end of the ED attendance. EWS were calculated for each set of vital signs recorded, as per online supplementary appendix 2. If a vital sign was missing from a particular vital sign set, it was presumed to be normal. The highest EWS during a patients’ ED stay were used for analysis for most scores. However, the VSSinitial was calculated from data at triage or admission to the cubicle, whichever had the first set of vital signs. The VSG Scores were calculated according to their designated time points (on admission, at any time or throughout the entire ED stay).

Key outcome measures

The primary outcome was 2-day (48 hours), 7-day and 28-day inhospital mortality after admission to hospital from the ED. Secondary outcomes were a clinical deterioration within 2 days of admission (defined as cardiorespiratory arrest or a medical emergency team call), an intensive care unit (ICU) admission within 2 days and admission to hospital. Discharge from ED after 28, the patients’ medical records were accessed by the principal investigator and data on the study outcomes was extracted. This was with the exception of admission to hospital, which was known at the end of the patients’ ED stay.

A random selection of 10% of the medical record data was reviewed by co-investigators (PD, JS and EDT). One transcription error of demographic data was found and corrected. No errors of vital sign or outcome data were found. No evidence for incorrect EWS calculation was found.

Data analysis

The EWS were analysed for their ability to predict the outcomes of interest. Area under the receiver operator characteristic (AUROC; 95% CIs) curves for these scores were calculated using SPSS for Windows statistical software (V.24.0, SPSS, Chicago, IL, USA). EWS with AUROC values ≤0.599 were considered failed predictors of that outcome. EWS with AUROC values between 0.60 and 0.69, 0.70 and 0.79, 0.80 and 0.89, and≥0.90 were considered poor, fair, good and excellent predictors of the outcome, respectively.25

The sample size calculation was based on the primary outcome. To have a 95% chance that the sample mortality would lie within ±2% of the expected mortality (5%) from our institution’s data, a sample size of at least 475 admitted patients was required.


Of the 2000 patients recruited, 270 were excluded from analysis (figure 1). The 47 patients transferred to another hospital were excluded because outcome data could not be obtained on them. In all, 1730 (86.5%) patients were analysed for admission to hospital: mean (SD) age 58.6 (21.3) years, 831 (48.0%, 95% CI: 45.7 to 50.4) were men. Their presenting complaints varied: abdominal pain (220 patients, 12.7%), chest pain (183, 10.6%), trauma (147, 8.5%), shortness of breath (132, 7.6%), falls (87, 5.0%), urinary problems (87, 5.0%), limb problems (77, 4.5%), gastroenteritis/bleed (65, 3.8%), dizziness (55, 3.2%) and other complaints (677, 39.1%). In total, 690 (41.0%) patients were admitted to the study hospital: mean (SD) age 64 (20.6) years, 359 (52.0%, 95% CI: 48.2 to 55.8) were men. These 690 patients were analysed for mortality, clinical deterioration and ICU admission.

Figure 1

Patient population recruitment flow. ED, emergency department.

Primary outcome measure

Of the patients admitted to hospital, 10 (1.4%), 18 (2.6%) and 32 (4.6%) patients died within 2, 7 and 28 days, respectively. The patients who died tended to be older (mean age: 74 years). Most EWS were good or excellent predictors of 2-day mortality (table 2). When considering the point estimates of the AUROC, ViEWS was the most strongly predictive, with NEWS and AbViEWS also both highly predictive (AUROC: 0.95 and 0.95, respectively). However, when considering the 95% CIs, there was no difference between the better performing EWS.

Table 2

EWS outcome measures

At 7 and 28 days, the EWS were less predictive of mortality. At 7 days, the best predictors were MEWS GCS, ViEWS and AbViEWS (AUROC (95% CI): 0.84 (0.76 to 0.93], 0.83 (0.71 to 0.94) and 0.82 (0.71 to 0.94), respectively). At 28 days, the best predictors were ViEWS, NEWS and AbViEWS (AUROC (95% CI): 0.82 (0.74 to 0.90], 0.82 (0.74 to 0.89) and 0.79 (0.70 to 0.88], respectively). The best predictors were based on the point estimates. However, when considering the 95% CIs, there were no differences.

Secondary outcome measures

Of the patients admitted to hospital, 33 (4.8%) patients deteriorated (as defined) within 2 days. All EWS were failed or poor predictors of clinical deterioration within 2 days (table 2). The AUROC values ranged from 0.54 (VSG ‘Throughout’) to 0.70 (MEWS GCS). In all, 24 (3.5%) patients were admitted to ICU within 2 days. Most EWS were failed or poor predictors of ICU admission (table 2). The AUROC values ranged from 0.510 (Goodacre Score) to 0.715 (WPS). All EWS were failed or poor predictors of the need for admission to hospital (table 2). The AUROC values ranged from 0.510 (VSG ‘Throughout’) to 0.681 (MEWS GCS).


This is the first prospective study to directly compare the large number of well-known EWS using the same undifferentiated adult patient population in ED. Numerous EWS have been developed. However, variable patient presenting complaints and the need for simple, routinely collected component items mean that complicated EWS are unlikely to be appropriate in the ED. We deliberately examined only EWS where score generation was relatively simple and that had been designed for non-specific patient populations within the ED or similar settings.

Primary outcome measure

All EWS examined were designed with the intention of predicting mortality and, in some cases, other outcomes. We found that all had good and excellent predictive ability for mortality, particularly in the short term. ViEWS, NEWS and AbViEWS were the best predictors of 2-day mortality.

It is difficult to directly compare our findings with those of others due to differing settings (eg, ED, acute medical unit, inpatient) and outcomes (eg, 1-day or 2-day mortality). However, there is some consistency within the literature. In a European ED, Alam et al 26 reported that NEWS has good predictive ability for 30-day mortality (AUROC: 0.87). In a Vietnamese ED, Ha et al 27 examined WPS and REMS and reported AUROC curves of 0.80 and 0.71, respectively, for 30-day mortality. In an acute medical assessment unit, Smith et al 8 and Prytherch et al 9 examined the predictive abilities of ViEWS and NEWS for 1-day mortality, respectively. Both EWS had an AUROC curve of 0.89. In the same setting, Braband et al 28 reported that WPS, REMS, NEWS and Goodacre also had good predictive ability for 1-day mortality (AUROC curves: 0.85, 0.84, 0.83, 0.82, respectively). Collectively, these reports and those of this study suggest that both ViEWS and NEWS are important predictors of mortality in the short term after hospital admission.

ViEWS, NEWS and AbViEWS, our best predictors of 2-day mortality based on the point estimates of the AUROC, each employ the data items of respiratory and heart rate, temperature, systolic blood pressure, oxygen saturation and the need for supplemental oxygen. ViEWS and NEWS also employ a level of consciousness item (awake, response to verbal or painful stimuli, unresponsive (AVPU)). While AVPU is relatively simple, it is an additional, possibly subjective data item that needs to be measured. Importantly, there were no significant differences between ViEWS, NEWS and AbViEWS. Hence, as AbViEWS is simpler and performs as well, our findings suggest that it may be the best choice for widespread use in the ED to identify patients at risk of critical illness who may benefit from early review by senior staff.

Only ViEWS and NEWS maintained good predictive ability of 28-day mortality, a finding consistent with that of Alam et al.26 The finding that the predictive ability of all EWS decreased over time is not surprising and is consistent with the findings of Prytherch et al.9 As the patient’s condition changes and therapeutic interventions are introduced, their ED status becomes more remote and less relevant. From the ED perspective, prediction of 2-day mortality is the most clinically relevant outcome as it is more likely to affect decisions regarding ED management (eg, early administration of vasopressors) and place of disposition (eg, inpatient ward vs ICU).

Secondary outcome measures

All EWS were poor or failed predictors of clinical deterioration within 2 days of admission.

Consistent with this finding, Smith et al 8 reported that NEWS was only a fair predictor for cardiac arrest in the medical assessment unit (AUROC: 0.72, 95% CI: 0.69 to 0.76). The poor predictability of the EWS for this endpoint is not surprising as few were designed for the prediction of patient deterioration as defined in this study. Instead, most employed other definitions for deterioration for example, the need for ICU admission and death.

Notwithstanding the relative paucity of cohort studies of EWS for the prediction of cardiopulmonary arrest, clinical intervention studies have demonstrated encouraging results. In particular, the use of MEWS in the inpatient setting has been associated with substantially lower rates of cardiopulmonary arrest.14–16 29 Patel et al 16 concluded that the adoption of MEWS may lead not only to improved patient care but also to more efficient use of available clinical resources.

Unlike clinical deterioration, several of the EWS examined were derived and validated for use in predicting ICU admission. It was surprising, therefore, that most EWS were poor predictors of ICU admission within 2 days. Only WPS and Groarke were fair predictors. In other ED studies, Alam et al 26 and Merz et al 24 reported that NEWS and VSSinitial, respectively, significantly predicted the need for ICU admission. However, as neither of these reports described AUROC curve values for this endpoint, direct comparisons with our findings are difficult. In the medical assessment unit, Groarke et al 21 reported that an increasing Groarke score was significantly associated with need for ICU admission. In the same setting, Smith et al 8 reported that NEWS was a good predictor of unanticipated ICU admission within 24 hours (AUROC: 0.86, 95% CI: 0.85 to 0.87). As ICU admission is subject to variables other than the patient’s condition (eg, policies, clinician opinion, bed availability), it may not be a useful outcome to evaluate EWS performance.

Few of the EWS examined were designed for the prediction of admission to hospital from the ED (table 1). It is not surprising, therefore, that all EWS were poor or failed predictors for this outcome. This finding is consistent with those of Alam et al 26 who reported that NEWS was a poor predictor of admission when calculated at three ED time points (AUROC curve range: 0.66–0.69).

The utility of an EWS in the ED setting will depend on a range of characteristics. First, it must have, at least, good predictive ability for important clinical outcomes. In this respect, our findings and those of others suggest that the available EWS are likely only to be of use in predicting mortality. Second, the generation of EWS should be simple and, wherever possible, automatic. This study found no statistically significant differences between ViEWS, NEWS and AbViEWS, in regard to 2-day mortality prediction. However, as AbViEWS comprises one fever data item (no AVPU), it is the simplest of the three. If the need for oxygen were made a compulsory data collection item in the eMR, then AbViEW scores could be automatically calculated and patients with scores exceeding the threshold for review could be ‘flagged’. Other EWS may also be useful, especially if their component items are routinely recorded. Finally, the EWS review threshold should have high enough sensitivity and specificity to avoid the possibility of substantial workload increases associated with unnecessary patient reviews. Smith et al 8 addressed this by introducing the concept of an EWS efficiency curve. Their premise was that the more specific the EWS, the fewer scores there would be to trigger patient reviews. Hence, scores shown to be most specific would be the most useful.

The ED is often only one point in a patient’s illness journey, with many traversing the pre-hospital, ED and inpatient settings. At present, the relative usefulness of individual EWS in each of these settings is not known. Furthermore, data collected from each setting and collated to provide a single EWS may increase the performance of the score.

This study has important limitations. The convenience sampling technique may have introduced selection bias. The severity of illness of patients admitted outside of our recruitment hours may have differed. Additionally, there is the potential for a seasonal bias with patients only enrolled during the warmer months. A considerable proportion (15.9%) of patients were excluded from analysis. This also may have introduced selection bias. Most patients were excluded because of incomplete data. Incomplete data hint at the possibility that these patients were not as sick as others. If so, our patient population may have been sicker than the overall ED population. Exclusion of some patients also decreased the effective sample size which will have decreased the precision of the AUROC calculation. It is recommended that future studies compare EWS performance on patients with and without complete data.

The primary outcome of 2-day mortality may not have been ideal. A 24-hour mortality may have been superior. Although conjecture, the finding of a patient at risk of death within 24 hours may provoke a more immediate clinical response than a perceived risk within the next few days.

The prospective study design, with all data (except outcomes) recorded while the patient was in the ED, will have helped to maximise data quality. The data quality assurance exercise uncovered only one erroneous data item. It is unlikely, therefore, that measurement bias substantially affected the results. However, if a vital sign was missing from a particular vital sign set, it was presumed to be normal. This may have resulted in under-estimation of scores.

With the exception of admission to hospital, the numbers of patients with the outcomes of interest were small in relation to the overall sample size. While a small change in the number of patients with these outcomes may have affected the results, the sample size was considerable and likely to have been sufficient to adequately reflect outcome incidence.

We intentionally did not record interventions that might have affected vital signs for example, intravenous fluid boluses. While these may have affected (confounded) the outcomes of interest, none is used to generate an EWS and the true effect of these interventions would be difficult to determine.

As a single-centre study, its external validity may be limited. This is of particular importance as the study aimed to determine the EWS that would best be applicable to all undifferentiated patients in ED.

In summary, the EWS examined only adequately predicted mortality, especially in the short term. Their use for other outcomes is not recommended. The best predictors of mortality were ViEWS, NEWS and AbViEWS. AbViEWs performs as well as ViEWS and NEWS but is simpler with one fewer component data item. It is recommended that EWS should be integrated into the eMR to allow the flagging of patients likely to benefit from review. Prospective studies could evaluate this intervention for its effects on mortality, timing of death, changes in the management of flagged patients and ED workload.



  • Contributors DT and WS designed the study and obtained ethics committee approval. WS, JS, PD and EdeT collected all the data. WS managed the data collection and entry into the study database. DT and WS undertook all data analysis and preparation of the manuscript. All authors contributed to revision of the manuscript and approved the final version. DT supervised the study overall.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Ethics approval The Austin Health Human Research Ethics Committee, Heidelberg, Victoria, Australia, approved the study. Approval number: LNR17Austin454.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Patient consent for publication Not required.