Article Text

Download PDFPDF

Limitations in validating emergency department triage scales
  1. Michele Twomey1,
  2. Lee A Wallis2,
  3. Jonathan E Myers3
  1. 1School of Public Health, University of Cape Town, Cape Town, South Africa
  2. 2Division of Emergency Medicine, University of Cape Town & Stellenbosch University, Capetown, South Africa
  3. 3Occupational and Environmental Health Research Unit, University of Cape Town, Capetown, South Africa
  1. Correspondence to:
 Michele Twomey
 School of Public Health, University of Cape Town, Cape Town, South Africa; satriage{at}


Objective: To examine whether current validation methods of emergency department triage scales actually assess the instrument’s validity.

Methods: Optimal methods of emergency department triage scale validation are examined in developed countries and their application to developing countries is considered.

Results and conclusion: Numerous limitations are embedded in the process of validating triage scales. Methods of triage scale validation in developed countries may not be appropriate and repeatable in developing countries. Even in developed countries there are problems in conceptualising validation methods. A new consensus building validation approach has been constructed and recommended for a developing country setting. The Delphi method, a consensual validation process, is advanced as a more appropriate alternative for validating triage scales in developing countries.

  • ATS, Australasian Triage Scale
  • CTAS, Canadian Triage Acuity Scale
  • ED, emergency department
  • ESI, Emergency Severity Index
  • ETAT, Emergency Triage Assessment and Treatment
  • MEWS, Modified Early Warning Score

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Emergency department (ED) triage is the process of sorting and filtering patients based on medical priority. It aims to determine a patient’s acuity level in order to facilitate timely and effective care before their condition worsens. A patient’s acuity level is defined as the urgency for effective care. In the ED triage setting effective care is defined as the provision of an intervention or treatment that reduces the patient’s urgency for care or prevents clinical deterioration.1 If patients receive timely and effective care, triage has achieved its purpose (as seen at point A in fig 1).

Figure 1

 Triage scales are most valuable at (A), where triage facilitates optimal time to highly effective care, and least valuable at (B), where triage facilitates delayed time and ineffective care.

This illustration of triage is a highly simplified approach to a complex set of interrelationships. It is acknowledged that additional variables may influence optimal time to care and effectiveness of care significantly (such as variability in triage nurse decisions).


The evaluation of a triage tool involves assessing reliability and validity.2Reliability refers to the degree to which repeated assessments of the same patient with a triage instrument will deliver the same acuity level. Inter-rater reliability determines whether there is significant variability between different triage officers rating the same patient, and intra-rater reliability assesses the variability within a single triage officer re-rating the same patient. Reliability makes no reference to a criterion, and so only illustrates consistency with triage repetition. It shows nothing about its validity (whether it is a reflection of the truth). A measure can therefore be highly reliable without being valid.3

Reliability can be estimated by evaluating different types of agreement. Percentage agreement, the κ coefficient and the weighted κ coefficient are three common ways of measuring agreement between raters,4 but these measures can generate quite different values. Measuring only the percentage agreement is not recommended because it does not take into account agreement expected on chance alone.5 The κ coefficient considers both percentage agreement between raters and percentage agreement expected by chance; unfortunately, it does not take into account the magnitude of disagreement, which may become significant in ordinal data. As a result, the weighted κ coefficient has become the instrument of choice as it assigns different weights of agreement according to the magnitude of disagreement, and enables more explicit comparisons between different studies.4 While the majority of research in triage has focused on inter-rater and intra-rater reliability, which has its uses, it is of greater importance to determine whether a triage tool is in fact valid. We will therefore be focusing on the validity of a triage tool rather than its reliability.


Validity refers to the degree with which the measured acuity level reflects the patient’s true acuity at the time of triage. The term valid implies that there is some sort of external reference or “gold standard” which by definition has absolute accuracy.3 Studies that aim to see how closely an instrument approximates the truth, test criterion validity. Unfortunately it is not possible to measure the truth for patient acuity,6 as there are myriad events that can occur from the time that a patient presents to the ED to the time of discharge (including the length of time to initiation of care, the quality of that care, and non-medical factors influencing disposal—for example, social factors). As a result, surrogate outcome markers have been used as criteria to assess validity. This has led to other ways of assessing validity for ED triage tools. The two most commonly found in the literature are tests of predictive or consensual validity. These have been approached in a unifying manner by Streiner and Norman, who reconceptualise a variety of notions of validity commonly used in the literature as construct validity.7

There is a hierarchy of validity testing in which criterion is the best (table 1). Streiner and Norman have shown that unlike the traditional classification of validity, predictive, consensual and other types of validity are all seen as variants of construct validity.7 Typically in developed countries, criterion validity methods are used.

Table 1

 Traditional validity testing versus Streiner and Norman’s framework

We will use Streiner and Norman’s conceptual framework to answer the following questions:

  • Do current methods of triage tool validation actually assess the validity and what are the limitations underlying these methods?

  • How can these limitations be overcome with special reference to developing countries?


A number of different triage systems are used in developed countries. To date, four reliable ordinal ED triage scales have been researched and published: the Australasian Triage Scale (ATS),8 the Canadian Triage Acuity Scale (CTAS),9 the Emergency Triage Scale (aka Manchester Triage Scale)10 and the Emergency Severity Index (ESI).11 While there has been some focus on the reliability of triage tools, not much is published on their validity. Predictive validity (a type of construct validity) is the most frequently used method of assessing tools.12 It considers the degree to which the triage acuity level is able to predict true acuity. Particular outcomes, or events with time-ordering, are selected as surrogate markers (such as mortality rates, hospital admission rates, resource utilisation, and length of stay in hospital). There are methodological problems with the use of this type of validity as it does not always answer the core question: “Is the triage instrument able to measure what is supposed to be measured?” In patients it does not measure acuity at the time of assessment (and is inherently confounded by the effectiveness of the health care intervention).

Examples of predictive validity abound in the triage literature, as surrogate outcome markers are practical to measure and are claimed to be closely associated with true acuity.3 This has compelled clinicians and researchers to utilise triage instruments as prediction tools. However, our ability to identify and measure the relationship between patient acuity level and outcome depends not only on the measurement of the surrogate outcome marker and the patient’s acuity level, but also very importantly on confounding variables such as variability in triage nurse decisions, and delayed and ineffective treatment. These may affect the surrogate outcome marker.


A detailed literature review revealed that very little has been published on triage in developing countries. The World Health Organization reports that triage research is not a priority in low- to middle-income countries.13 They have accordingly developed the Emergency Triage Assessment and Treatment (ETAT)14 for application to developing countries. While this subjective system has been successfully implemented in Malawi, countries like India, Brazil and South Africa have sought a more objective triage instrument based on physiology. They have either adopted the triage instrument from a developed country or modified it to their own local context and needs (Patriacia Neto, Quinta D’or Hospital, Rio de Janeiro, May 2007, personal communication). South Africa has adapted the Modified Early Warning Score (MEWS) as the South African Triage Scale after validating it on the local national population.15 Some areas of Brazil have adopted the CTAS, others the ESI.

During any validity testing an important distinction needs to be made between internal validity (which refers to inferences about the source population), and external validity (whether inferences may be generalised to people outside the source population).16 A triage tool designed for a developed country may be valid in that context, leading to favourable results that are meaningful and have implications for action. If, however, the same triage tools were applied in a developing country, results may vary due to different resources and skills. Similarly results may vary when applying surrogate markers from developed countries to undertake validity testing in developing countries. This variability may increase the random error in both triage acuity level and outcome category; it would therefore be more appropriate to apply a locally developed tool that is meaningful in the local context (has internal validity), but that may not be applicable in a developed country (lack of external validity).

Whichever tool is used, an assessment of its usefulness in these settings is required. When selecting surrogate outcome markers (such as mortality rates, hospital admission rates, resource utilisation, and length of stay in hospital), it is assumed that there is systematic record keeping, and that the care given is effective. While this may often be the case in developed countries, it is typically not the case in developing countries. Poor record keeping and ineffective care may have significant effects on surrogate outcome markers and patients’ final dispositions. Markers such as these are imperfect measures of patient acuity in the developing world. It is thus important to identify and measure all confounding variables that may be affecting the surrogate outcome marker: given the poor record keeping and lack of efficiency, this is unlikely to be feasible in developing countries.

Delphi methodology

The Delphi method was developed in the 1950s by the RAND Corporation in California, USA.17 The technique has diversified and is being applied to more mainstream social sciences, in business and, in the last two decades, within the healthcare arena.18 It is a consensus building technique designed to gain insight into a particular field to enable decision making in areas where published information is inadequate or non-existent.19 The approach of the Delphi technique is to establish a panel of appropriate experts that have agreed to complete an iterative process on a particular issue, with the key objective being to reach consensus.20 Panellist anonymity is maintained throughout the process and controlled feedback is provided from each iterative round, resulting in a statistical aggregation of the group response.18

The Delphi method is another form of construct validity that may be useful when assessing triage scales in developing countries. It allows the development of a surrogate “gold standard” determined by specialist panel consensus. The triage tool’s validity may then be tested against this construct of true underlying acuity that is consensually arrived at. There appear to be only very few examples in the world literature that elaborate on the use of this form of construct validity.

Wallis et al21 used consensus from Delphi methodology to establish triage acuity levels against which to test pre-hospital mass casualty triage tools: such methodology may be used in ED triage tool assessment.

There are several reasons why the Delphi methodology is best suited to assessing ED triage tools in developing countries. The Delphi technique eliminates potential bias due to individual group dynamics and is financially feasible.

Limitations of the Delphi technique are mostly a result of poorly conducted studies rather than fundamental problems. One of the weaknesses cited is that the response rates can be low and often decrease as the rounds progress. However, non-response is typically very low in practice, since most researchers have personally obtained assurance of participation. Similarly attrition tends to be low and the researcher can easily ascertain the cause by talking with the dropouts.22 Selection of the Delphi panel depends on the research question. Problems may arise with a lack of representativeness in that only experts with an interest and involvement will become participants. Another potential weakness of the Delphi as a consensus method is that it overlooks important minority issues because it tries to obtain consensus.23 However, despite these limitations we believe that the Delphi process is the most appropriate form with which to test the validity of triage tools in the developing world.


In developing countries a form of construct validity derived from a consensual process appears to be the most appropriate form of validation of triage tools. This is due to lack of criteria for true acuity, confounding variables that relate to differential health care resources by level of development, and lack of external validity of other triage scales.24 We propose the Delphi method when testing the South African Triage Scale. This is an example of construct validity testing in the developing world.



  • Funding: None

  • Competing interests: None

  • Contributions: LW had the original idea; MT wrote the first draft; both authors contributed to the final article.