Article Text

THERM: the Resuscitation Management score. A prognostic tool to identify critically ill patients in the emergency department
  1. G N Cattermole1,
  2. E C H Liow2,
  3. C A Graham3,
  4. T H Rainer3
  1. 1Emergency Department, Princess Royal University Hospital, London, UK
  2. 2Department of Medicine, Canberra Hospital, Canberra, Australian Capital Territory, Australia
  3. 3Accident and Emergency Medicine Academic Unit, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
  1. Correspondence to Professor T H Rainer, Accident and Emergency Medicine Academic Unit, Chinese University of Hong Kong, Rooms 107/113, Trauma and Emergency Centre, Prince of Wales Hospital, Shatin, New Territories, Hong Kong; thrainer{at}cuhk.edu.hk

Abstract

Introduction Prognostic scores are widely used in the emergency department (ED) to stratify risk for critically ill patients. The Prince of Wales ED Score (PEDS) was derived specifically for patients in an ED resuscitation room to predict death or intensive care unit (ICU) admission. We aimed to validate and refine this score, in comparison with other scores including the National Early Warning Score (NEWS).

Methods This was a single-centre prospective study of adult resuscitation-room patients over 3 months.

Comparison of scores was made using receiver operating characteristic analysis. Physiological and blood test variables were compared according to the composite primary outcome: admission to ICU or death within 7 days of attendance. Multivariate logistic regression was used to derive a new prediction score, which was validated in comparison with NEWS using the historic dataset from which PEDS had been derived.

Results 234 patients were included; 37 were admitted to ICU or died within 7 days. PEDS performed adequately but was not superior to other scores. A simple pragmatic score, The Resuscitation Management score (THERM) was derived which outperformed NEWS in derivation and validation sets.

Conclusions PEDS is at least as good as other scores, including NEWS. However, it is unwieldy and relies on results not immediately accessible in the ED. THERM is a new score, derived and validated in an ED setting, using variables readily available, and simple to calculate and stratify. THERM outperforms NEWS and could be used in preference in critically ill ED patients.

  • emergency department
  • resuscitation
  • clinical assessment

Statistics from Altmetric.com

Introduction

Prognostic scores are widely used in the emergency department (ED) to predict which patients are at risk of an adverse outcome. These scores can supplement clinical judgment in stratifying risk for individual patients, guiding referral and treatment decisions. They also enable case-mix adjustment for audit or research.1

There are many scores for use in specified conditions, such as the Ranson and Glasgow scores for pancreatitis, Blatchford and Rockall for upper gastrointestinal haemorrhage, CURB-65 for pneumonia, TIMI and GRACE for acute coronary syndrome. The use of trauma scoring is well established, with the probability of survival predicted by the Trauma Score Injury Severity Score used for interhospital and international comparisons of trauma outcomes. Several scores have been developed to assess critically ill patients, including the Acute Physiology and Chronic Health Evaluation II for use in the intensive care unit (ICU). However, none of these scores was developed specifically for use in the ED.1

The National Institute for Health and Clinical Excellence recommends use of physiological track and trigger systems to monitor adult hospital patients, and the Royal College of Physicians of London has recently advised the adoption by the UK's National Health Service of the National Early Warning Score (NEWS).2 ,3 NEWS was developed from ViEWS (VitalPAC EWS) which itself was derived from vital sign data from over 35 000 patients admitted to a medical assessment unit.4 NEWS classifies patients into three risk groups: low risk (score 0–4), medium (5–6, or any one parameter scoring the maximum), high (7 or more). These thresholds would trigger different levels of response. Although it is proposed that this be used throughout the patient journey, from prehospital to inpatient, it has not been validated in an ED.

The majority of UK EDs have been using the Modified Early Warning Score (MEWS).5 MEWS was developed in a UK medical admissions unit,6 but the same author did not find it to be of significant value in the ED.7 Other scoring systems suggested for use in the ED include: the Simple Clinical Score (SCS),8 and Rapid Emergency Medicine Score (REMS),9 all of which were derived in acute medical or ED admissions. The Mainz Emergency Evaluation Score (MEES) was designed to assess the effectiveness of prehospital care.10 The Mortality in the ED Sepsis (MEDS)11 and Worthing12 scores were derived from ED patients with sepsis. Goodacre's DAVROS score was derived in ED patients, but is rather a complex risk adjustment model and not intended for bedside use.12

To our knowledge, The Prince of Wales ED Score (PEDS) is the only score designed for adult patients admitted to an ED resuscitation room for any cause. It was developed from a prospective study of resuscitation room patients in the ED of the Prince of Wales Hospital, Shatin, Hong Kong.13 It included six prognostic variables: systolic blood pressure (SBP), Glasgow coma score (GCS), blood glucose, serum bicarbonate (HCO3), leukocyte count and a history of metastases. The most important factors in the score were GCS<8, and HCO3<22 mmol/L. Low-risk, moderate-risk and high-risk patients were classified according to a PEDS <15, 15–29, >29, respectively. PEDS significantly outperformed Acute Physiology and Chronic Health Evaluation II, REMS, MEWS and Revised Trauma Score in predicting death or ICU admission within 7 days of ED attendance, but as yet there has been no validation study.

In addition, when the score was developed there was no rapid availability of lactate measurement, high levels of which are associated with mortality.14 PEDS is also quite complex, and difficult to calculate quickly in the resuscitation room. The inclusion of white cell count requires rapid turnaround from the laboratory, and knowledge of metastatic disease is not always readily obtainable.

Objectives

This study therefore aimed: (1) to validate PEDS in comparison with other prognostic scores (MEWS, SCS, REMS, MEES, MEDS, Worthing and NEWS); (2) to simplify and refine the score, using only variables that are immediately available in the resuscitation room (including lactate); (3) to validate this new score in the dataset previously used to derive the original PEDS.

Methods

Setting

This was a prospective observational study conducted in the ED of the Prince of Wales Hospital, Shatin, Hong Kong. The ED serves a population of approximately 800 000, receives over 150 000 new patients per annum and admits 25–30% of those attending.

Patients

Consecutive adult patients managed in the resuscitation room during weekdays over a 3 month period, ending January 2009, were included. Patients under 18 years of age, women in labour and those declared dead on arrival were excluded.

Data collection

Data was collected in real time by a dedicated research student and research nurse, and entered in a standardised form. Patient characteristics, medical history, physiological and biochemical data were collected as required for each scoring tool. Physiological data were those first measured in the ED by medical or nursing staff. All patients had blood samples taken, and 1 mL of the first sample obtained (arterial or venous) was analysed using point-of-care testing for glucose, sodium, potassium, bicarbonate, pH, lactate and haemoglobin. White cell count required laboratory analysis.

Outcome measures

The primary outcome was the occurrence of death or admission to intensive care within 7 days of ED attendance. A good outcome was defined as survival at 7 days without ICU admission. Secondary outcomes included 30 day mortality and hospital length of stay. Outcome data were obtained from the patient records. The Hong Kong Hospital Authority uses a territory-wide computerised clinical management system, which records every clinical episode in the public sector.

Statistical analysis

Step 1: prospective validation of PEDS.

Receiver operating characteristic (ROC) curves were used to describe and compare the performance of each predictive score.

Step 2: development of a refined and simplified score, The Resuscitation Management score (THERM).

For the refinement and simplification of PEDS, continuous and categorical data were used. Categorical variables were predefined according to the definitions in the original PEDS study.13 Univariate analysis determined which parameters would be included in stepwise multivariable logistic regression, using unpaired t tests for continuous variables and χ2 tests for categorical variables. Independently significant variables were used to construct a new scoring tool, based on the ORs derived from logistic regression, and pragmatically according to ease of use.

Step 3: retrospective validation of THERM.

The dataset from the original PEDS study (data from 2006)13 was used retrospectively to test the refined and simplified score using ROC curves and by comparing the proportions of poor outcomes in each of the three risk groups.

Throughout, p<0.5 was considered statistically significant, and 95% CIs calculated. Data were analysed with MedCalc (V.12.4.0.0, MedCalc Software bvba, Belgium).

Sample size

A nomogram for diagnostic studies15 was used to estimate a required sample size of 230 patients, based on the results of the PEDS study (pα=0.05, expected prevalence 23%, required CI 7%, expected sensitivity 90%).

Results

Patient characteristics and case-mix

Two hundred and fifty-one patients were eligible and 17 refused consent. Two hundred and thirty-four patients were included in the study, of whom 37 (16%) had a poor outcome. Demographic and case-mix data were typical for Hong Kong resuscitation room patients, and similar to the previous study. Age ranged from 18 years to 101 years, mean (±SD) 65.8 (±18.1) years; 58.5% were male. The most frequent admitting specialties were general medicine (57%), general surgery (7%) and neurosurgery (9%). Other specialties admitted fewer than 5%. Diagnosis groups included cardiovascular (25%), neurological (15%), respiratory (14%), gastrointestinal (9%), trauma (8%). Other causes each accounted for fewer than 5%.

Step 1: prospective validation of PEDS.

Table 1 presents the results of ROC analysis. Worthing, MEES, PEDS and MEWS were significantly better than MEDS, but otherwise there was no statistically significant difference between scores.

Table 1

Comparison of prognostic scores according to ROC analysis (n=234)

Step 2: development of a refined and simplified score, THERM.

Appendix 1 (see online supplementary material) presents demographic details, physiological and laboratory variables, and prediction scores, according to outcome groups. Continuous variables are presented as means (±SD) with p values according to the t test. Categorical variables are presented as numbers (%) with ORs and p values according to the χ2 test.

GCS (between 3–15) and HCO3 (less than 22 mmol/L) were strongly related to outcome (GCS OR=0.8, p=0.0005; HCO3.OR=0.878, p=0.0028). For rule development, simple addition of GCS and HCO3 reflected the similar OR for these continuous variables. As a categorical variable, SBP<100 was most strongly related to outcome (OR=0.266, p=0.01). For rule development, subtraction of four from the previous sum reflected this OR for a dichotomous variable. The pH and oxygen saturation were of marginal statistical significance. Lactate was not independently predictive of outcome.

In order to create a simple, pragmatic rule it was decided to use only GCS, HCO3 and SBP. THERM score was defined as:Embedded Image

HCO3 is measured in mmol, to a maximum of 22. Hypotension is defined as SBP<100 mm Hg. The maximum THERM score is 37. High-risk, medium-risk and low-risk groups were defined as THERM ≤30, THERM 30.1–35, THERM 35.1–37, respectively.

Area under the receiver operating characteristic curve (AUROC) for THERM was 0.84 (95% CI 0.786 to 0.884). This was significantly better than NEWS, REMS, SCS and MEDS. The advantage over Worthing, MEWS and MEES did not reach significance. Of the patients in the high risk group, 21/42 (50%) suffered a poor outcome; 12/60 (20%) in the medium-risk group; 4/132 (3%) in the low-risk group. Comparison between THERM and NEWS is described in table 2. In the identification of medium-risk and high-risk groups, THERM had superior specificity; there was no significant difference in sensitivity or predictive values.

Table 2

Comparison of THERM and NEWS (n=234)

Step 3: retrospective validation of THERM

Using the 2006 dataset from which the original PEDS was derived, THERM performed similarly well. AUROC was 0.82 (95% CI 0.77 to 0.86), figure 1. Excluding PEDS, this was at least as good as the other scores assessed in that study, although only significantly better than REMS. Of the patients in the high risk group, 38/57 (67%) suffered a poor outcome; 21/78 (26.9%) in the medium-risk group; 18/195 (9.2%) in the low-risk group. Comparison between THERM and NEWS is described in table 3 and figures 2 and 3. In the identification of medium and high risk groups, THERM had superior specificity; there was no significant difference in AUROC, sensitivity or predictive values.

Table 3

Retrospective comparison of THERM and NEWS (2006 dataset, n=330)

Figure 1

Receiver operating characteristic curves for THERM (The Resuscitation Management score) and National Early Warning Score (NEWS) (2006 dataset, n=330).

Figure 2

Outcomes according to THERM (The Resuscitation Management score) risk groups (2006 dataset, n=330).

Figure 3

Outcomes according to National Early Warning Score (NEWS) risk groups (2006 dataset, n=330).

Discussion

PEDS failed to perform as well in the 2009 validation study as it had in the 2006 study; the AUROC fell from 0.91 (0.87 to 0.94) to 0.75 (0.69 to 0.81). A decrease in performance between derivation and validation is expected, as any derived rule will best reflect the dataset from which it is derived. This is especially the case with smaller studies, and reinforces the need to test a new rule in an independent validation set. PEDS still performed at least as well as the other seven scores, and significantly outperformed MEDS. MEDS was developed for use in patients with sepsis,16 which might explain why it performed less well in a more general resuscitation room setting. However, the Worthing score performed well, and was also designed for sepsis.11 PEDS performs adequately, but is still too complex for routine use in the ED, and requires information (leukocyte count, history of metastases) that might not be immediately available.

Lactate was not available in our ED when PEDS was derived.13 The addition of lactate has subsequently been found to improve the performance of ViEWS in critically ill medical admissions,17 but it was not independently predictive of outcome in this study. Bicarbonate was a far more important predictor, and is more likely to be available at point of care in the ED than lactate. THERM is a simple score based on the addition of GCS to HCO3, with an adjustment for hypotension. In the pursuit of simplicity and elegance it might appear counterintuitive to design a rule with a maximum score of 37. However, the number is familiar to clinicians as normal body temperature, 37°C, and the high and medium risk groups reflect a similar range of deviance from normal as does hypothermia. Mild hypothermia is <35°C, severe <30°C.18 Keeping THERM simple came at the expense of some accuracy. It would have been possible to design a more precise rule using other variables from the multivariable logistic regression (including pH and SpO2, or using mean arterial pressure (MAP) instead of SBP). However, this would have made the rule less easy to use, and would not have guaranteed any advantage on validating the rule in an independent dataset.

As it happens, although THERM may appear simplistic it still performs well, and performed equally well in the validation set. Particularly, it performed at least as well as NEWS, and was significantly more specific in both datasets. This might be partly because NEWS was derived in a different patient group (medical assessment unit patients in the UK).3 ,4 NEWS is also intended for ongoing monitoring of inpatients rather than a first-look snapshot in the ED. NEWS is entirely physiological, without any need for bedside blood tests. It is not unexpected therefore that a rule derived in resuscitation room patients, using point-of-care tests available to the Emergency Physician, would perform better than NEWS in the ED. The scoring of some of the parameters in NEWS also limits its usefulness in the ED. Use of supplemental oxygen scores highly, which is reasonable for stable inpatients. But many ambulance and resuscitation-room patients are routinely given oxygen initially, with subsequent titration or removal. To include this as part of the first-look score in the ED would limit its discriminatory function. NEWS does not discriminate between degrees of reduced consciousness; anything below ‘alert’ on the alert, to voice, to pain, or unconscious (AVPU) scale indicates medium risk, regardless of other parameters. Again, in the resuscitation room this would have little discriminatory value. The appropriateness of using NEWS in ED patients (and in the prehospital setting) needs to be questioned. It is also important that although the AUROC for THERM was not statistically significantly better than that of NEWS in the validation set, the specificity of THERM was better than NEWS at both cut-offs (high-risk and low-risk). THERM is not intended as a rule-out tool, but to rule-in. Patients identified as being at risk would merit further senior review before admission (after which, it would be more appropriate to use NEWS for ongoing monitoring on the ward). As a rule-in tool, it is the specificity of the cut-off point that is more important than the sensitivity or the overall AUROC.

Although NEWS is now recommended in the UK, MEWS6 has been the most commonly used track and trigger system in UK EDs,5 as well as in other countries. This and our previous study13 have shown similar results for MEWS, with AUROC 0.73 (this study) and 0.76 (in the 2006 data). THERM performed at least as well as MEWS, but the advantage was not statistically significant. Like NEWS, MEWS is entirely physiological and intended for ongoing monitoring rather than as a snapshot in the resuscitation room. It is also too complex to be able to calculate rapidly.

In this study, the outcome was a composite of death and/or ICU admission within 7 days. Other studies (including ViEWS, on which NEWS was based), used mortality.3 ,4 Some of those who died within 7 days might not have done so had they been admitted to ICU; some who are admitted to ICU would have died had they not been admitted. In other words, ICU admission and death are not independent outcomes. By using this composite outcome, we hoped to minimise the effect that different ICU admission-policies would have on the death rate. From the perspective of the ED clinician, any patient who dies or needs ICU is very unwell, and this is the group that THERM seeks to identify.

THERM is intended to supplement, not replace, clinical judgment. Ideally, every patient in the resuscitation-room would be assessed by a senior clinician already. Those with lower THERM scores (higher risk) should be considered for referral to critical care, similar to the recommendations for inpatients identified at risk by NEWS. The predisposition, infection, response and organ failure (PIRO) score (a sepsis staging system developed for patients admitted to hospital) has recently been found to add little value over ED clinical judgment in guiding admission to the appropriate level of care,19 and it would be useful to compare THERM with emergency physician judgment, in a future study. THERM could also be used for audit and research, to establish case-mix in a particular ED. There may also be scope for assessing the usefulness of THERM in other ED patients, particularly boarders whose admission to inpatient wards is delayed beyond a few hours.

Limitations

The study was conducted on weekdays, because that was when the researchers were available to obtain real-time measurements. However, THERM performed similarly when applied to the 2006 dataset, which was conducted by prospective collection of data from case notes for patients attending at all times throughout the week. It is unlikely therefore that THERM is significantly biased.

The validation set was historical data, rather than a new prospective study. It was also within the same institution. There needs to be a prospective, external validation study of THERM.

For THERM to be rapidly available in the resuscitation room, we considered it preferable to use the first sample obtained whether arterial or venous. There is a theoretical small difference between arterial and venous values of bicarbonate, pH and lactate, but previous studies have shown good agreement for these in ED and ICU patients.20 ,21

Conclusion

PEDS did not perform as well in the validation study as it had previously, but it was still at least as good as other scores, including NEWS. However, it is unwieldy and relies on results not immediately accessible in the ED. THERM is a score for use in ED patients, derived and validated in an ED setting, using variables readily available in the ED, and simple to calculate and stratify. It outperformed NEWS in the derivation and validation sets, and could be used in preference in critically ill ED patients.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • Contributors All authors contributed to the conception and design of this study. EL took the lead role in data collection, under the supervision of GC. EL and GC conducted the analysis. GC wrote the draft of the paper which was contributed to substantially by CG and TR, and reviewed by EL. TR is the guarantor.

  • Competing interests None.

  • Ethics approval Chinese University of Hong Kong.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.