Objective To evaluate the construct of triage acuity as measured by the South African Triage Scale (SATS) against a set of reference vignettes.
Methods A modified Delphi method was used to develop a set of reference vignettes. Delphi participants completed a 2-round consensus-building process, and independently assigned triage acuity ratings to 100 written vignettes unaware of the ratings given by others. Triage acuity ratings were summarised for all vignettes, and only those that reached 80% consensus during round 2 were included in the reference set. Triage ratings for the reference vignettes given by two independent experts using the SATS were compared with the ratings given by the international Delphi panel. Measures of sensitivity, specificity, associated percentages for over-triage/under-triage were used to evaluate the construct of triage acuity (as measured by the SATS) by examining the association between the ratings by the two experts and the international panel.
Results On completion of the Delphi process, 42 of the 100 vignettes reached 80% consensus on their acuity rating and made up the reference set. On average, over all acuity levels, sensitivity was 74% (CI 64% to 82%), specificity 92% (CI 87% to 94%), under-triage occurred 14% (CI 8% to 23%) and over-triage 12% (CI 8% to 23%) of the time.
Conclusions The results of this study provide an alternative to evaluating triage scales against the construct of acuity as measured with the SATS. This method of using 80% consensus vignettes may, however, systematically bias the validity estimate towards better performance.
- emergency department
Statistics from Altmetric.com
With increased overcrowding in emergency centres (ECs), triage has been highlighted as a crucial process in prioritising patients based on medical urgency.1–3 Ideally, a triage scale should be highly reliable and valid.4 Reliability tells us how much agreement there is among staff triaging the same patient, while validity tells us how close the assigned acuity rating is to the true acuity of that patient.5
Much has been published on the validity of different triage instruments.6–16 However, there is a lack of consistency in the statistical measures and reference standards used to report on validity, which makes it difficult to draw comparisons between different triage scales. Before addressing the type of statistical measure used, a reference standard should be introduced when assessing whether the triage tool correctly identifies the true acuity of the patient. Previous discussions in the literature note the fact that no gold standard exists against which triage scales can be validated.17 ,18 Therefore, one of the challenges in estimating validity lies in the task of identifying an appropriate reference for the true acuity of the patient.17 Some studies have assessed the validity of triage scales by using either one or two local experts, or outcome markers, such as mortality rates, hospital admission rates, resource utilisation and length of stay in hospital.11 ,19 ,20 Although these markers may be adequate in developed countries, they have limitations in less developed countries that are more adversely affected by factors, such as poor record keeping and limited resources. In less developed countries, such as South Africa, the modified Delphi method has been recommended as an alternative that may be used to develop an objective reference standard to evaluate triage scales, and overcomes factors, such as poor record keeping or ineffective care.17
Since 1950, the Delphi method has diversified and has been used more frequently in healthcare research over the past two decades.21 This consensus-building technique is designed to gain insight into a particular field by constructing consensual criteria to enable decision making in areas where published information is scarce or non-existent.22 The approach establishes an appropriate panel of experts who have agreed to partake in an iterative process on a particular issue with the key objective being to reach consensus.23 The Delphi consensus method has been previously used to identify clinical criteria that define triage priority in a major incident setting, and has been recommended as an alternative when evaluating triage scales.24 The South African Triage Scale (SATS) is a four-level system that objectively categorises the medical urgency of EC patients based on age-appropriate physiology and clinical discriminators.25 The aim of this study is to evaluate the construct of triage acuity as measured by the SATS against a set of reference vignettes.
This study was conducted in two stages. In the first stage, we undertook a modified Delphi method (figure 1) using a two-round consensual process, based on a series of clinical vignettes that had been collected from real EC patient presentations. The source and method of collection of these vignettes have been presented elsewhere.25 ,26 In the second stage, we studied the agreement between the two experts’ SATS ratings against the reference vignettes as a validation measure for the construct of triage acuity.
Stage one: modified Delphi method
The authors identified international experts in the field of EC triage; 34 experts from developed and less developed countries were approached to take part in a modified two-round Delphi study, and 18 (53%) agreed. Participants were selected from countries where triage scales had already been developed and validated or were in the process of being developed and validated. Seventeen of the 18 participants who agreed to take part came from developed countries. The identity of participants (including emergency physicians and nurses) was undisclosed to the other panel members.
Methods of measurement
Participants were emailed and asked to independently triage written sets of vignettes using the tool they were most familiar with in their day-to-day practice. The categories they assigned had to fall into one of four triage acuity levels described in table 1. Participants were unaware that these were SATS categories. The vignettes were made available online for easy access at a time that was convenient for participants.
Participants completed a two-round consensus-building process, and independently assigned triage acuity ratings to 100 written vignettes unaware of the ratings given by others. Based on extensive use in other studies, and its advantage of saving cost and time, vignettes were used as a suitable estimate of live triage cases.28–30 The 100 adult vignettes were prospectively abstracted from randomly selected actual EC case presentations at a secondary hospital in South Africa. Vignettes included gender, age, presenting complaint, mode of arrival and vital signs. Appendix A shows examples. Consensus was built on the acuity ratings given to each vignette, and those reaching 80% or more group consensus were included.31 The 80% consensus level is commonly used in Delphi studies and was, therefore, set as the group consensus level required for inclusion of vignettes into the reference set.31 ,32 Participant anonymity was maintained throughout the process, and controlled feedback was provided in the form of a statistical aggregation of the group response after each iterative round.28
Round 1: Using an online system, Delphi participants were asked to assign acuity categories as indicated in table 1 to a set of 100 vignettes (see Appendix A showing an example of round 1). Any vignettes reaching 80% consensus after round 1 were removed.
Round 2: Vignettes that did not reach consensus were sent back to the Delphi participants. For each vignette, the acuity level assigned by the majority was indicated, and participants were given a chance to either change their original acuity rating assigned to each vignette or leave the rating unchanged (Appendix A shows an example of round 2). On completion of round 2, triage ratings were summarised for all vignettes, and only those that reached 80% group consensus on their acuity rating on either round were included in the set of reference vignettes.31
Participant responses of the first and second rounds of the Delphi process were summarised using STATA statistical software package V.9.2.33 Descriptive statistics of assigned acuity levels for the 100 vignettes were summarised for rounds 1 and 2. Based on previous studies and recommendations in the Delphi literature, only the vignettes that reached 80% consensus were included in the set of reference vignettes.31
Stage two: cross-sectional validation of the construct of triage acuity
South African raters
Two South African experts (not part of the Delphi panel) with knowledge and experience of the SATS took part in this cross-sectional validation.
Data collection and analysis
Participants were asked to independently assign acuity ratings to the 42 reference vignettes (as derived in stage 1) using the SATS as their method of triage (table 1). The SATS categories assigned by the two South African experts were compared with the reference acuity levels assigned by the Delphi group. Validity was evaluated by calculating the sensitivity, specificity, and associated over-triage/under-triage relative to the Delphi acuity ratings. Histograms were designed to illustrate and visually compare mis-triage at each acuity level.
On completion of round 2, 80% group consensus was reached on the acuity level of 42 of the 100 vignettes (emergency n=9; very urgent n=17; urgent n=10; routine n=6). Appendix B includes the 42 vignettes with their respective acuity ratings that make up the reference set. Appendix C includes vignettes that did not reach 80% group consensus, but revealed only one acuity level discrepancy among raters (n=17). The highest group consensus among these vignettes ranged from 61% to 78%.
The vignettes with the lowest plotted percentage group consensus on acuity rating (n=10) are described in Appendix D; these ratings were divided across three to four acuity levels. In all 10 vignettes, the highest group consensus ranged from 50% to 56%. The lack of consensus seen in these 10 vignettes pertains mostly to the acuity level ‘urgent’ indicating systematic bias in the middle acuity categories, which is also visible in figure 2. If we had lowered the consensus cut-off to 78%, there would have been an increase in the vignettes with acuity ratings ‘very urgent’ and ‘urgent’, bringing the total number of vignettes with group consensus up to 55. Lowering the consensus cut-off further to 72% would produce an even further increase in vignettes with acuity ratings ‘very urgent’ and ‘urgent’, bringing the total number of vignettes with group consensus up to 69.
Evaluating the construct of triage acuity as measured with the SATS
The 42 vignettes that reached 80% group consensus were used as a reference standard to assess the construct of triage acuity as measured by SATS. Two South African experts triaged the 42 vignettes (84 ratings). Table 2 summarises the sensitivity analysis and shows that average sensitivity and specificity across all categories was 74% (CI 64% to 82%) and 92% (CI 87% to 94%), respectively. Average under-triage across all categories (14%; CI 8% to 23%), occurred as frequently as average over-triage across all categories (12%; CI 6% to 20%) (relative to the acuity assigned by the Delphi panel).
Figure 3 summarises all vignettes with acuity levels as assigned by the South African experts, and illustrates the probability that they will over-triage/under-triage vignettes at each acuity level relative to the acuity assigned by the Delphi panel. It shows that about 44% of the vignettes given an ‘emergency’ acuity level by the Delphi panel were under-triaged by one acuity level by the South African experts. About 6% of the vignettes categorised ‘very urgent’ by the Delphi panel were under-triaged as ‘urgent’ by the South African experts, and about 30% of the vignettes assigned ‘urgent’ by the Delphi panel, were over-triaged as ‘very urgent’ by the South African experts.
This study used a modified Delphi method to build consensus on 100 vignettes and evaluated the construct of triage acuity as measured by the SATS, in which 42 vignettes had a minimum group consensus of 80%, and the rest did not meet the minimum group consensus of 80%.
The modified Delphi method may be seen as a form of measuring consensual validation that may be useful when undertaking triage-scale validations or comparing triage-scale validity in less developed countries. In developed countries, triage scales are often validated against outcome markers, such as admission, death, length of stay and resource utilisation. These markers assume systematic and comprehensive record keeping as well as effective care and adequate resources, which in a less developed country may be limited and, therefore, inappropriate.17
Validation may be performed using a reference standard of underlying acuity that is consensually arrived at. This has been shown in previous studies where consensus was used from Delphi methodology to establish triage acuity levels against which prehospital mass casualty triage tools were tested.32 It has been suggested that EC triage scales be evaluated in the same way.17 ,24 The 18 experts were not asked to use the SATS. By contrast they were asked to use any triage scale, decision tree or triage algorithm of their choice to triage the set of vignettes, provided they assigned the vignettes to one of the four given priority levels, whereas the South African experts actually applied the SATS to determine the priority of each vignette. The ratings of the 42 vignettes that reached 80% consensus were, in turn, used as a reference standard against which the ratings of the two South African experts could be evaluated. The benefits of using the results of this modified Delphi process for further triage scale validation studies are that it provides a reference standard that is complementary to other validation studies and potentially more appropriate and financially viable for less developed countries.
An accepted performance indicator for measuring validity of triage scales is sensitivity, which measures inherent characteristics of the scale with percentage of over-triage/under-triage with reference to the Delphi panel ratings in the case of this study. To our knowledge, no accepted norms exist for over-triage/under-triage at each acuity level. The accepted range for average under-triage of not more than 5–10%, which the American College of Surgeons Committee on Trauma (ACSCOT) considers unavoidable, and an associated average over-triage rate of 30–50% applies exclusively to trauma patients and, therefore, has limited use when assessing a mixed patient population of trauma and non-trauma cases.34 A recent retrospective cohort study indicated that achieving these ACSCOT benchmarks was not feasible in Pennsylvania, and that these guidelines needed modification if they were to be implemented.35
The SATS demonstrated satisfactory average sensitivity of 74% (CI 64% to 82%) and specificity of 92% (CI 87% to 94%). The extent of average over-triage was 12% (CI 6% to 20%) and under-triage 14% (CI 8% to 23%). High percentages of under-triage are a concern to patient care implying longer waiting times, delayed definitive patient care, leading to increased mortality and morbidity. High percentages of over-triage do not directly impinge on patient care, but may indirectly compromise patient care for the collective because overstretched and limited resources are diverted from those in genuine need who are truly a higher priority. Over-triage is, therefore, an important consideration in resource-poor settings, where resource allocation, if inappropriately prioritised, may lead to loss of life.
The use of paper-based vignettes is a limitation, as contextual information and subtle non-verbal cues are lost. Having video recordings of actual patients would have provided a more realistic setup but was ethically not possible. We thus tried to overcome this barrier by observing real patients prospectively and transcribing the situation into vignettes with as much detail as possible.
The selection of the Delphi panel was determined by the research question and may have been limited in its representativeness, as only those experts with an interest may have become involved as participants. We tried to address this by inviting experts from a nursing and emergency physician background from developed and less developed countries. However, the majority responded from developed countries.
Limitations of the Delphi method are mostly a result of poorly conducted studies rather than fundamental problems. Response rates can be low and may decrease in the second round.29 However, non-response and attrition did not occur in this study as each Delphi panellist personally gave assurance of their participation.
Another potential weakness of the Delphi method is that it may overlook important minority issues because it tries to obtain consensus.36 This method of using 80% consensus vignettes may also systematically bias the validity estimate towards better performance. However, despite these limitations, we feel that the Delphi method is an appropriate alternative to develop a reference with which to test the validity of triage scales in less developed countries.
Based on extensive use in other studies and its advantage of saving cost and time, vignettes were used as a suitable estimate of live triage cases.28–30
The results of this study provide an alternative to evaluating triage scales against the construct of acuity as measured with the SATS. By utilising this set of references, and vignettes developed via modified Delphi consensus method, we may be able to perform more comparative studies on triage scales in less developed countries and overcome some of the common barriers (such as poor record keeping and resource limitations) experienced.
The authors would like to acknowledge the involvement of all participants on the Delphi panel as well as the two local experts from South Africa whose contribution helped make this project possible.
Review history and Supplementary material
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Contributors MT conceived the idea, did technical data collection, completed the analysis and wrote the first draft. LAW and JEM contributed substantially to the final draft.
Competing interests None.
Ethics approval This study was granted approval from the research ethics committee, University of Cape Town (REC REF 063/2005).
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.