Article Text


Predictive scoring in non-trauma emergency patients: a scoping review
  1. Kirsty Challen,
  2. Steve W Goodacre
  1. Health Services Research, ScHARR, University of Sheffield, Sheffield, UK
  1. Correspondence to Dr Challen, 84 Whitbarrow Road, Lymm, WA13 9BA; kirstychallen{at}


This study is an inclusive scoping review of the literature relating to outcome prediction in adult non-trauma emergency patients, in order to identify the number and range of risk scores developed for acutely ill adults and to identify the outcomes these scores predict. The data source used was Medline 1950–2009. To be eligible for inclusion, papers had to detail an assessment tool, wholly or predominantly clinical, applied at the point of patient presentation to unscheduled healthcare services with outcome measures up to 30 days after presentation. Papers detailing trauma, paediatrics, purely obstetric or psychiatric presentations, tools wholly applied in a critical care setting, tools requiring an algorithm not freely available, biomarkers or tests not routinely available in an Emergency Department (ED) setting were excluded. 192 papers were reviewed. Within 17 broad disease categories, 80 inclusion criteria were used, 119 tools were assessed (25 of which were non-disease specific), and 51 outcome measures were used (30 of which were disease-specific). The areas under the receiver-operator characteristic curve (AUROCs) varied from 0.44 to 0.984. The multiplicity of tools available presents a challenge in itself to the acute clinician. Many tools require a specific diagnosis, which is not immediately available, and the authors advocate ED development of tools for case-mix adjustment and clinical risk stratification.

  • Clinical assessment
  • major incidents
  • clinical care
  • prehospital care
  • clinical management

Statistics from


Risk scores may be used to predict which non-trauma patients presenting to an Emergency Department (ED) are likely to suffer adverse outcomes. They have two broad purposes within clinical medicine: 1. to guide individual patient management by risk stratification, to determine best site-of-care, to place a ceiling on intensity of intervention, to decide if palliation is appropriate and to support information provided to patients and relatives; and 2. to provide case-mix adjustment for research and audit.

The use of standardised tools to affect site-of-care decisions is most advanced in the prehospital management of trauma; a number of rules have been proposed to identify major trauma patients in need of direct transfer to a specialised trauma centre or of the presence of a full trauma team.1–5 The use of standardised alert systems in hospital has recently been advocated by the UK National Institute for Health and Clinical Excellence to identify the acutely ill patient and ensure the appropriate level of care.6

The science of risk prediction and case-mix adjustment is advanced in trauma and critical care. A multiplicity of predictive tools exists in the critical care literature (APACHE I–IV,7–10 Mortality Probability Model I–III,11–13 Simplified Acute Physiology Score I14 and II15), together with refinements based on changes of those scores over time.16–19 In the UK,20 21 Australasia,22 Europe23–25 and the USA,26 various audit groups provide analysis to aid comparison between different units. In the USA and the UK, multi-site data collection (the American College of Surgeons Trauma Quality Improvement Programme27 and the Trauma Audit Research Network28) is ongoing to provide risk-adjusted mortality ratios to assist in quality assurance at individual care providers.

The absence of similar tools in non-trauma patients causes problems in risk prediction and case-mix adjustment. Patients with delayed admission to critical care areas have higher rates of mortality than those admitted directly from the ED.29 30 Not all patients require admission to hospital or critical care, but the lack of existence of a good indicator of future deterioration may engender defensive practice and unnecessary admissions. The lack of a valid tool for case-mix adjustment also causes problems in our era of league tables. Crude mortality estimates may reflect case mix rather than quality of care, and risk-adjustment may be subject to the ‘constant risk fallacy’.31 Failure to take these factors into account can lead to inappropriate conclusions being drawn about the association between quality of care and mortality.32

Attempts to implement risk-prediction methods in clinical decision-making, audit and research are hampered by the substantial range and number of risk scores available. There are so many potential scores for non-trauma patients that deciding which score should be used and which variable measured presents a challenge in itself. Therefore, this study aimed to carry out a scoping review of the literature relating to outcome prediction in adult non-trauma emergency patients, in order to identify the number and range of risk scores developed for acutely ill adults and to identify the outcomes these scores predict.


The aim was to identify papers describing assessment tools applied at the point of patient presentation to unscheduled healthcare services (excluding trauma, paediatrics and purely obstetric or psychiatric presentations) and describing short-term outcomes. A search of Medline 1950 to October week 3 2009 was carried out using a deliberately inclusive two-pronged strategy (tables 1 and 2). The search was deliberately designed to achieve breadth rather than depth. It was intended to determine the scope of risk scores available, rather than obtain accurate estimates of the performance of each score.

Table 1

Previously identified severity scores for non-trauma patients searched for by name and/or common abbreviation

Table 2

Search strategy for prognostic indicators

All searches were limited to English language, humans and adults. Search output was limited by title, abstract or full paper review to those papers fitting three criteria: 1. a wholly or predominantly clinical assessment (ie, not biomarkers or specialist tests not available in the majority of EDs such as myocardial scintigraphy); 2. an adult population and 3. an outcome measure up to 30 days after presentation. Also assessment tools requiring a specialist algorithm not freely available, or those that were applied only to patients in a critical care setting were excluded.

The following data were extracted from each article selected for inclusion: the name and/or acronym of the score, the target condition or conditions, the patient groups included in the target condition(s), the main outcomes measured and the discriminant value of the score, expressed as the area under the receiver-operator characteristic curve (AUROC) or sensitivity and specificity. The AUROC is also known as the c-statistic. It is the probability that a randomly selected patient from those with the outcome of interest will have a higher score than a randomly selected patient without the outcome of interest. A score with a c-statistic of 0.5 or less has no value for discriminating which patients will suffer the outcome of interest. Similarly, a dichotomised score for which the sensitivity and specificity add up to 100% or less has no discriminatory value.

It was not planned to synthesise data, but to present descriptive data outlining the breadth of scores available for different conditions, the outcomes measured and the range of AUROC values reported.


The initial searches identified 14 659 (method 1) and 46 605 (method 2) titles. A significant number of titles were identified by more than one search. Six hundred and eighty-two (method 1) and 1661 (method 2) abstracts were screened and 192 papers deemed to fit the inclusion criteria.

Scoring systems were available for 17 broad conditions. Within these 17 conditions, 80 different inclusion criteria were used (table 3).

Table 3

Inclusion criteria

One-hundred and nineteen tools were assessed (table 4). Of these, 25 were generic (non-disease-specific). A number of tools were assessed in multiple disease categories.

Table 4

Tools assessed

Fifty-one different outcome measures were used (table 5). Of these, 30 were disease-specific.

Table 5

Outcome measures

A variety of different measures were used to report score performance. Of 247 analyses using death as an outcome, 190 reported an AUROC, of which 69 reported an AUROC greater than 0.8. Of 215 analyses not including death as an outcome, 151 reported an AUROC, of which 30 reported an AUROC greater than 0.8. A number of studies (22) used the same dataset to compare the predictive value of a single tool for different outcomes (table 6). For comparison, the lowest AUROC in the study was 0.44 (PIMI for predicting hospital death in patients with acute myocardial infarction204) and the highest was 0.984 (APACHE II for predicting hospital death in patients with peritonitis177). It is generally accepted that an AUROC of over 0.8 represents good discriminatory capacity.226

Table 6

Studies with comparison of different outcome measures

Studies were variously purely derivation, mixed derivation and validation, external validation and secondary analysis of other datasets (including disease registries) (table 7).

Table 7

Source of datasets


A wide variation in the patient groups to which scoring systems are applied has been demonstrated, and an equally wide variation in patient outcomes considered relevant. The sheer number of available tools makes it impossible for the working clinician to use more than a few in daily practice. The discriminant value of the scores, expressed as an AUROC or sensitivity and specificity, often varies between studies and is poor in many cases, suggesting the score will have limited value in practice. Furthermore, most scores have only been tested in the population in which they were developed. This will tend to overestimate the discriminatory value and further reduce the value of the scores in practice.

The authors are not aware of any previous systematic reviews that have attempted to characterise the full scope of risk scores available for non-trauma patients. Although there is obviously a huge amount of primary data relating to risk scores, there have been few attempts to systematically evaluate these data and draw broader conclusions for clinical practice. Indeed, one of the characteristics of the literature relating to risk scores is that each risk score seems to be developed de novo with very little reference to previous studies or other scores. This may reflect the tendency for studies developing risk scores to be secondary analyses of existing datasets rather than studies undertaken for the primary purpose of developing a risk score. The present review suggests that further unfocussed primary research is unlikely to clarify the situation. Instead, future studies of risk scores should aim to build on existing data and be designed specifically to develop an optimal risk score.

The study is limited by the structure and the lack of information in many included papers. Few were precise about the timing of the assessment, leaving potential for lead-time bias. The majority focused on hospital-specific outcomes, and it is often unclear to what extent patient-relevant out-of-hospital outcomes have been investigated. The often restricted nature of patient sets (eg, requiring consultant radiologist confirmation for the diagnosis of pneumonia) limits the generalisability of many of the results to the day-to-day ED population where formal diagnosis is often not known initially; only four papers could be identified assessing a truly unselected group of ED patients.189 190 192 193

Although a number of reviews have analysed the performance of systems identifying high-risk inpatients,227–229 the authors are unaware of any previous review of similar tools available to the ED clinician.

It is apparent that one outcome measure does not fit all; in the limited literature assessing the performance of the same tool for two different outcomes, the results rarely matched. Clinicians must therefore examine their practice and decide which outcomes are relevant to their patients and situation. It is highly unlikely that a tool developed for case-mix adjustment will perform equally well at clinical risk stratification; currently the ED community lacks a tool for either and both should be developed. It is likely, given the heterogeneity of ED patients, that it will be challenging to develop a single overall predictive tool; it may be that a variable of presenting complaint (along the lines of APACHE) will be required in such a tool for it to be of benefit in simplifying risk prediction for the practising Emergency Physician.


View Abstract


  • Data sharing Dataset available from KC at kirstychallen{at}

  • Funding SG is an employee of the University of Sheffield. KC is funded by a Medical Research Council PhD studentship. Neither body had any role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the article for publication.

  • Competing interests All authors have completed the Unified Competing Interest form at (available on request from the corresponding author) and declare that (1) no authors have support from any commercial company for the submitted work; (2) no authors have relationships with any commercial company that might have an interest in the submitted work in the previous 3 years; (3) their spouses, partners, or children have no financial relationships that may be relevant to the submitted work; and (4) no authors have non-financial interests that may be relevant to the submitted work.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • Primary survey
    Geoff Hughes