Article Text

other Versions

Download PDFPDF

One-two-triage: validation and reliability of a novel triage system for low-resource settings
  1. Ayesha Khan1,
  2. S V Mahadevan1,
  3. Andrea Dreyfuss2,
  4. James Quinn1,
  5. Joan Woods3,
  6. Koy Somontha3,
  7. Matthew Strehlow4
  1. 1Division of Emergency Medicine, Stanford University, Stanford, California, USA
  2. 2Department of Emergency Medicine, Highland General Hospital, Oakland, California, USA
  3. 3University Research Co. Centre for Human Services, Phnom Penh, Cambodia
  4. 4Division of Emergency Medicine, Stanford University, Stanford, California, USA
  1. Correspondence to Dr Ayesha Khan, 300 Pasteur Drive, Alway Bldg., Room M121, Stanford, CA 94305, USA; akhanx{at}


Objectives To validate and assess reliability of a novel triage system, one-two-triage (OTT), that can be applied by inexperienced providers in low-resource settings.

Methods This study was a two-phase prospective, comparative study conducted at three hospitals. Phase I assessed criterion validity of OTT on all patients arriving at an American university hospital by comparing agreement among three methods of triage: OTT, Emergency Severity Index (ESI) and physician-defined acuity (the gold standard). Agreement was reported in normalised and raw-weighted Cohen κ using two different scales for weighting, Expert-weighted and triage-weighted κ. Phase II tested reliability, reported in Fleiss κ, of OTT using standardised cases among three groups of providers at an urban and rural Cambodian hospital and the American university hospital.

Results Normalised for prevalence of patients in each category, OTT and ESI performed similarly well for expert-weighted κ (OTT κ=0.58, 95% CI 0.52 to 0.65; ESI κ=0.47, 95% CI 0.40 to 0.53) and triage-weighted κ (κ=0.54, 95% CI 0.48 to 0.61; ESI κ=0.57, 95% CI 0.51 to 0.64). Without normalising, agreement with gold standard was less for both systems but performance of OTT and ESI remained similar, expert-weighted (OTT κ=0.57, 95% CI 0.52 to 0.62; ESI κ=0.6, 95% CI 0.58 to 0.66) and triage-weighted (OTT κ=0.31, 95% CI 0.25 to 0.38; ESI κ=0.41, 95% CI 0.35 to 0.4). In the reliability phase, all triagers showed fair inter-rater agreement, Fleiss κ (κ=0.308).

Conclusions OTT can be reliably applied and performs as well as ESI compared with gold standard, but requires fewer resources and less experience.

  • triage
  • global health
  • emergency care systems, efficiency

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key messages

What is already known on this subject?

  • Current triage scales are designed for healthcare settings in high-income countries, requiring experienced medical providers or specialised algorithms. Due to a dearth of experienced emergency healthcare providers and limited resources, many hospitals in low- and middle-income countries (LMICs) have forgone triage altogether or adapted established triage scales with varying success. Despite the clear need for a triage system designed specifically for LMICs that can be ‘universally and broadly implemented’, no comprehensive, reliable system has been established.

What might this study add?

  • Our study shows that one-two-triage, a triage scale created for low-resource settings, can be reliably applied by inexperienced providers in LMICs and is valid in determining acuity of patients presenting to the ED.


EDs require a systematic approach to prioritise patient care depending on acuity.1 ED triage scales standardise initial patient assessment and reduce wait times for patients with life-threatening conditions, thereby minimising unnecessary deaths and disability.2 However, current triage scales are designed for healthcare settings in high-income countries (HICs), requiring experienced medical providers or specialised algorithms. Due to a dearth of experienced emergency healthcare providers and limited resources, many hospitals in low- and middle-income countries (LMICs) have forgone triage altogether; others have adapted established triage scales with varying success.3 ,4 Thus, although current triage systems theoretically apply to health professionals in various contexts, they are ineffective in many developing nations.

Prior to this study, most government hospitals in Cambodia lacked a standardised system for sorting patients based on acuity. Instead, patients typically self-triaged to the hospital department deemed most appropriate (eg, ED, intensive care unit (ICU), inpatient wards) and were seen in the order of arrival. As the number of Cambodian patients with acute medical emergencies has sharply risen,5 there is an immediate need to implement a validated triage system for adult and paediatric patients designed specifically for LMICs.

A panel of experts from Stanford University developed a triage system to be implemented within Cambodia's current healthcare infrastructure: one-two-triage (OTT). Named to highlight the two-stage process of collating patients by severity—the isolation of critical/emergent from non-critical patients followed by the separation of urgent from non-urgent patients—OTT was designed specifically for use in training-limited and resource-limited settings. OTT demands minimal medical knowledge for the initial information-gathering stage, can be applied to both adult and paediatric patients, and does not require a full set of vital signs to identify the most critical cases. At the request of the Cambodian Ministry of Health, Stanford University and University Research Consortium partnered to implement OTT in government hospitals across Cambodia. The current study aimed to validate OTT and test its reliability following implementation in two Cambodian government hospitals.


Study design and setting

This two-phase prospective, comparative study evaluated the validity and reliability of a novel triage system. Phase I evaluated the validity of OTT at Stanford University Hospital (SUH). Phase II tested the reliability of OTT at SUH, and at Siem Reap Provincial Hospital (SRPH) and Mehmot Referral Hospital (MRH) in Cambodia.

The three study hospitals are demographically distinct. SUH is an academic, adult and paediatric trauma centre with an annual volume of 60 000 patients. SRPH is a public, referral hospital providing adult emergency services to nearly 75 000 patients annually. MRH is a district-level facility that receives 7000 adult and paediatric patients per year. SUH employs the Emergency Severity Index (ESI) as a means of triaging patients, while neither SRPH or MRH used a formal triage process prior to OTT.

Methods and measurements

Phase I included patients presenting to the SUH ED during 36 eight-hour shifts between August and December 2013. Enrolment periods were split among day, night, weekday and weekend shifts. Twelve Stanford undergraduate students with no prior medical training enrolled patients and collected data after receiving a 1-day live training course on OTT (see online supplementary appendix E). The medical inexperience of the undergraduates was comparable to that of Cambodian nurses using the triage system for the first time. For each enrolled patient, the students assigned OTT acuity and recorded both ESI acuity (assigned by a trained ED triage nurse) and physician-defined acuity (assigned by the board-certified emergency medicine (EM) physician on shift) (figure 1A).

Figure 1

(A) Phase I: patients enrolled and triaged at Stanford. In phase I of the protocol, validity was assessed at Stanford University Hospital (SUH). Patients were triaged using Emergency Severity Index (ESI) by a nurse in the usual course of patient care, and they were assigned acuity using one-two triage (OTT), in parallel, during their triage assessment by an undergraduate student. The student then asked the attending physician responsible for the patient to assign physician-defined acuity (PDA) before the medical evaluation of the patient. Agreement between ESI and PDA and agreement between OTT and PDA were analysed using raw and weighted κ.  (B) Phase II: 63 standardised scenarios assigned OTT acuity. In phase II of the protocol, reliability was assessed by three different groups of providers assigning OTT acuity to 63 written cases. The three groups included nurses from Siem Reap Provincial Hospital (SRPH), Mehmot Referral Hospital (MRH) and Stanford University Hospital (SUH). Training in OTT was provided to the Cambodian nurses via a 1-day live workshop and to the Stanford nurses via a 1 hour training video. Fleiss κ was used to analyse agreement.

To assess validity, OTT-assigned acuity was compared with physician-defined acuity. We designated physician-defined acuity as the gold standard based on previous studies demonstrating that physician intuition better predicts patient mortality than exogenous physiological scoring systems and that years of experience generate accurate acuity assessment.6–9 For this study, physician-defined acuity was determined by board-certified EM-trained physicians at levels 1–4 corresponding to an ideal time to assessment of 0, 10, 30 and 120 min, respectively. Physicians were blinded to the acuity levels assigned by the alternative systems. As experienced board-certified EM-trained physicians are unavailable in Cambodia, SUH was chosen as the study site for phase I.

We also compared OTT with an established triage system, ESI. Rather than use a direct comparison, we measured both systems against physician-defined acuity, thus also testing the agreement between ESI and physician-defined acuity. Two a priori decisions were made to allow comparison between the two systems: (1) ESI and OTT triage scales differ in the absolute number of acuity levels, ESI levels 4 and 5 were combined, as these levels differ in resources used but not acuity, which is our primary concern. (2) Major and minor trauma activations were assigned ESI levels 1 and 2, respectively, based on analogous criteria for acuity assignment between scales (see online supplementary appendix A).

For phase I, sample size for unweighted κ was calculated using the R package kappaSize. The ‘Power4Cats’ function described therein was used since this was a four-category κ. The null hypothesis is that ratings were assigned by all observers randomly according to the usual ESI breakdown (0.15, 0.25, 0.25 and 0.35). We estimated a sample size of 120 patients for 90% power; unweighted κ calculation considered the usual ESI level breakdown at SUH ED (level 1 (15%), level 2 (25%), level 3 (25%) and level 4 (35%)). We estimated a sample size of 335 patients for 90% power to allow for the most extreme scenario of unevenly distributed triage (level 1 (5%), level 2 (5%), level 3 (85%) and level 4 (5%)), variability among groups of raters and weighting of the κ.

Phase II assessed the reliability of OTT through inter-rater agreement among 52 healthcare providers, each evaluating the complete set of 63 standardised, written EM cases. The participating providers comprised three groups: 8 nurses from SUH, 21 nurses and medical assistants from SRPH and 23 nurses and medical assistants from MRH. SUH nurses were trained to use OTT via a 1 hour, online training module. The Cambodian providers at SRPH and MRH received a 1-day, live training course taught in their native language (Khmer) (figure 1B). The difference in training duration reflects the existing familiarity with triage among SUH staff; the SUH module highlighted OTT-specific differences in triage compared with the more comprehensive module for Cambodian providers. The written EM case scenarios were based on cases commonly encountered in Cambodian EDs; they were devised and tested for agreement by a team of eight board-certified EM physicians from Stanford following consultation by a doctorate-level education specialist within the SUH Department of Surgery. The 35 adult and 28 paediatric cases were distributed across chief complaint categories and OTT acuity, as shown in table 1, with red being critical, orange emergent, yellow urgent and green non-urgent.

Table 1

Phase II: breakdown of standardised scenarios


Our primary study outcome for phase I was agreement between physician-defined acuity (gold standard) and assigned OTT triage level. Agreement was measured using a normalised, asymmetrical Cohen's weighted κ as it adjusts for degrees of discordance within ordinal scales. κ was weighted asymmetrically to reflect the greater clinical consequences of under-triaging versus over-triaging patients. Under-triaging—assigning a lower acuity than the patient's presentation warrants—can delay care with serious clinical consequences, whereas over-triaging—assigning a greater acuity than warranted—carries a lower risk of direct patient harm. Two different asymmetrically weighted scales were used: triage-weighted κ, introduced by Van der Wulp et al,10 and expert-weighted κ, a scale dependent on the consensus opinion of EM physicians at SUH ED (see online supplementary appendix B). To prevent signal loss from the relatively few high-acuity cases, the number of patients in each category was normalised to equal one another (see online supplementary appendix C). Secondary outcomes included raw agreement between OTT and physician-defined acuity (without weighting or normalisation), agreement between ESI and physician-defined acuity and comparative performance of OTT and ESI.

Our study assessed the criterion validity of OTT using emergency physician-defined acuity as the gold standard for correlation rather than the more commonly used correlation with surrogate markers, such as mortality, admission, length of hospital stay and recidivism.11–14 Since these outcomes depend on external factors, including delay in treatment, inadequate treatment or patient comorbid conditions, they may not accurately validate the system particularly in an LMIC where surrogate markers are heavily influenced by resource availability, adjunct therapies and provider inexperience.14–18 Conversely, criterion validity using our gold standard more directly assesses the triage tool. For further analysis, we examined parity between the criterion validity of OTT and the criterion validity of a well-established, reputable triage scale, ESI, with previously proven correlation to surrogate markers.15

During phase II, we assessed reliability for interobserver agreement using Fleiss κ to account for multiple raters. While Fleiss is typically used for nominal, non-ordinal data, the typology of triage categories is not strictly ordinal in nature as various criteria are applied and intersected to determine a patient's number. For example, ESI is not simply ordinal because there are essentially three categories based on acuity (1, 2 and the remainder) and three categories based on anticipated resource utilisation (3, 4 and 5). A patient may be higher in the resource category but lower in the acuity category and an aggregate of these ordinal scales is applied to determine the overall category, which does not then remain strictly ordinal. Similarly, OTT acuity—like ESI—is an aggregation of various ordinal scales and thus lends itself to Fleiss analysis.19

Data analysis

Data from the validity portion were analysed with Python and STATA. We assessed homogeneity within groups using one-way analysis of variance, treating each triage group as one unit for Cohen-weighted κ.

Calculations were carried out according to formulae described by Fleiss in Measuring nominal scale agreement among many raters using Palantir Metropolis software V. Equations for the κ and variance (which is the square of the SE) can be found in online supplementary appendix D.20 Final assignment of significance of agreement was based on the ratio of absolute κ divided by its SE, where a ratio >1.96 means p<0.05.15


Study subjects

During phase I, 482 patients were enrolled and triaged. ESI acuity was collected on 473 of 482 patients; physician acuity was assigned for all 482 patients based on their initial assessment. Of the patients, 231 were male and 251 female, ranging from 7 days to 97 years old with a median age of 36 years. The most common organ systems associated with patient chief complaints were gastrointestinal (20.8%), cardiothoracic (18.3%) and musculoskeletal (13.7%) (table 2).

Table 2

Phase I: breakdown of patients' chief complaint based on organ system

Phase I

The OTT triage system showed moderate agreement with physician-defined acuity using both normalised expert-weighted κ (κ=0.58, 95% CI 0.52 to 0.65) and normalised triage-weighted κ (κ=0.54, 95% CI 0.48 to 0.61). Without normalisation, OTT's agreement with the gold standard remained similar using expert-weighted κ (κ=0.57, 95% CI 0.52 to 0.62) but was weaker using triage-weighted κ (κ=0.31, 95% CI 0.25 to 0.38). Similar to OTT, the ESI triage system moderately agreed with physician-defined acuity. Without normalisation, agreement with the gold standard was less for both triage systems but performance of OTT and ESI remained similar. Table 3 shows the results of asymmetrically weighting the agreement using expert-weighted and triage-weighted κ.

Table 3

Phase I: weighted agreement by triage method

The number of patients assigned to each triage level by the respective triage method is shown in figure 2. OTT characterised slightly more patients as higher acuity (level 1 or 2) than ESI or physician-defined acuity, whereas the physicians characterised more patients as the lowest acuity (level 4). Overall, OTT over-triaged in 34% of cases and ESI in 27%; OTT and ESI both under-triaged 13% of patients.

Figure 2

Phase I: number of patients assigned to each acuity level by triage method: one-two-triage (OTT), Emergency Severity Index (ESI) and physician-defined acuity (PDA). The group of patients enrolled in phase I showed a similar distribution across acuity levels when triaged by OTT, ESI and PDA. OTT characterised slightly more patients as high acuity and PDA characterised slightly more patients as the lowest acuity.

Phase II

In the reliability portion (phase II), all healthcare providers using OTT showed fair inter-rater agreement across questions (Fleiss κ; κ=0.308). By individual site, SUH nurses showed substantial agreement (κ=0.717), whereas MRH and SRPH providers exhibited fair agreement (κ=0.334 and κ=0.269, respectively). When considering adult-specific and paediatric-specific questions, reliability remained consistent (figure 3). Finally, when accounting for SE, all Fleiss κs demonstrated agreement in the assignment of the four triage categories significantly greater than by chance (figure 4).

Figure 3

Phase II: agreement between triagers on standardised questions by site. Reliability was assessed by three different groups of providers assigning one-two-triage acuity to 63 written cases. The three groups included nurses from Siem Reap Provincial Hospital (SRPH), Mehmot Referral Hospital (MRH) and Stanford University Hospital (SUH). Using Fleiss κ, The SUH nurses showed substantial agreement and the MRH and SRPH nurses showed fair agreement across all scenarios. When considering adult and paediatric scenarios, agreement remained the same.

Figure 4

Phase II: test of significance for Fleiss κ. Reliability in phase II of the study was assessed by Fleiss κ. When accounting for SE, all Fleiss κs demonstrated agreement in the assignment of the four triage categories significantly greater than by chance.


In this study, we developed a triage tool for LMIC and tested it for validity and reliability. Despite the clear need for a triage system designed specifically for LMICs that can be ‘universally and broadly implemented’, no comprehensive, reliable system has been established. OTT addresses this need as it was designed to function in low-resource settings with limited provider training and little access to expensive technologies. The current study demonstrated that OTT can be reliably applied after a short training module for triagers in LMICs. OTT demonstrated good agreement in determining patient acuity compared with expert EM physician assessment. Importantly, OTT performed similar to ESI, an internationally recognised and used triage system, and outperformed ESI for the most critically ill patients.

Prior to OTT, the only widely accepted triage scales developed for LMICs were the emergency triage assessment and treatment (ETAT) and the South African Triage Scale (SATS). ETAT is limited to paediatrics and has been unreliable in LMICs, with large variations in assessment of priority signs even after 20 hours of training.4 Conversely, OTT was reliably applied by Cambodian nurses after only 8 hours of training. A strength of OTT is that it greatly limits the inclusion of signs and symptoms that demand subjective clinical decision-making by the triaging nurse. Stage 2 requires some clinical subjectivity; reflected in the greater agreement among US-trained nurses, who have more standardised education and clinical experience than the Cambodian nurses.

SATS is a relatively new triage tool created for low-resource settings with some similarities to OTT; it is comprehensive for adults and paediatrics and can be applied by nursing assistants.21 However, unlike SATS, OTT does not require a full set of vital signs for complete triage, which necessitates additional time and staff training. The only two vital signs necessary for OTT are HR and oxygen saturation, both rapidly obtained by portable pulse oximeters, which are relatively low cost (∼US$20), durable and require low maintenance. Abnormal oxygen saturation was previously shown to predict the need for ICU admission.22 Additionally, we validated OTT prospectively with authentic emergency patients, whereas SATS was validated solely using clinical vignettes.21 While we found OTT to slightly over-triage, SATS has a greater tendency to under-triage higher acuity patients.23 Over-triaging siphons ‘less-critical’ patients into the ED, potentially overwhelming medical providers or distracting care away from more acute cases; however, under-triaging critical patients has greater immediate implications for patient outcome.21 The underlying goal of triage is to provide the greatest good for the greatest number, so the sickest patients should be seen first, even if a few patients are occasionally judged as more acute.

At a recent International Federation of Emergency Medicine meeting held on triage for LMICs, it was noted that the development of new triage scales for LMICs may be ‘too daunting’ given the amount of work invested in existing triage scales.24 In part, research and expansion of triage scales in LMICs have been hindered by the lack of a universal method for assessing triage system performance.25 Our study addresses two important factors in the assessment of triage systems: (1) the gold standard of acuity against which system performance is validated and (2) the relative importance of under-triage versus over-triage.

Triage validity is challenging to determine because an effective triage scale measures the potential for deterioration of a patient; however, once a patient is triaged, interventions are implemented to prevent this deterioration. Therefore, we believe real-time assessment by an experienced EM-trained provider better indicates acuity than patient outcomes.

Our study captures the relative importance of under-triage versus over-triage through an asymmetrically weighted agreement scale. The exact risk of over-triaging and under-triaging is challenging to define as it depends on local factors such as patient volume, facility resources and cultural practices.21 Development of a standardised asymmetric weighting scale based on hospital resources by the EM community may result in more robust validation studies in the future.


Our study is limited by (1) the lack of a controlled environment to study triage and (2) the absence of a standard statistic to weigh agreement and determine the clinical relevance of over-triaging versus under-triaging.

Although this study used an accepted gold standard to assess clinical acuity, the Delphi method rather than a single provider may give a stronger consensus opinion, as suggested by Twomey et al.26 Further, we assessed validity in the USA rather than in Cambodia due to a lack of available EM experts in Cambodia. While differences in epidemiology and patient characteristics might theoretically impact the applicability of OTT in LMICs, this novel triage system was designed by a panel of EM experts with clinical experience in LMICs for use in such settings. Assessing reliability using written cases is limiting in that all relevant information is provided to the nurses and not dependent on the nurse eliciting information. Further, visual cues that prompt experienced providers to assign acuity based on gestalt are absent. However, our system is created for low-resource settings, where providers may not have experience, and we aggressively attempted to minimise subjective clinical assessment.

Normalising the prevalence of triage levels introduces an artificial bias since clinically equal numbers of patients across all triage levels are not expected. With additional time and resources for sampling, choosing a larger and equal number of patients from each category would be ideal. Normalisation inflates importance of a small sample of patients beyond what it may represent; this inflation favours performance in rare, critically ill patients over a scale that performs best for the larger portion of less sick patients. However, the disproportionate importance of correctly triaging level 1 patients may offset this limitation.

Lastly, a future study will be needed to assess reliability after implementation in other LMICs.


In conclusion, OTT is a valid, reliable and cost-effective alternative to ESI in Cambodia. Notwithstanding difficulties in assessment, OTT selected patients needing emergent and urgent care as effectively as ESI, a universally accepted triage scale, while requiring fewer resources. OTT outperformed ESI when dealing with the most critical patients. Moreover, OTT was applied reliably even when used by inexperienced providers in a low-resource setting. With OTT operational in 22 hospitals and pending implementation in 21 more throughout Cambodia, further research on the operational aspects of the scale, such as time-to-triage and adherence to triaging, is needed and forthcoming. While our study provides the first steps of validity and reliability testing for OTT in Cambodia, it will be important to establish that it can be replicated in other LMICs.


The authors thank Dr Anne Tecklenberg Strehlow for assistance with editing.



  • Contributors AK, MS, AD and JQ conceived the study and designed the trial. AK and AD supervised the conduct of the trial and the data collection. JW and KS aided in data acquisition. AK provided statistical advice and AD analysed the data. AK drafted the manuscript and MS and SM contributed substantially to its revision. AK takes responsibility for the paper as a whole.

  • Funding  Use of REDCap was supported by grant UL1 RR025744. Translation and local expenses were funded by the US Agency for International Development (USAID) under Better Health Services project, Cooperative Agreement No. 442-A-00-09-00007-00.

  • Disclaimer The findings of this study are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the US government.

  • Competing interests None declared.

  • Ethics approval Stanford University Hospitals Institutional Review Board, Cambodian Ministry of Health. The SUH Institutional Review Board approved both phases of the study (protocol 24500), and the Ministry of Health in Cambodia approved the triage training.

  • Patient consent Written consent was obtained from all participants in their native language, and Khmer-speaking and English-speaking staff were available for questions.

  • Data sharing statement Study data were collected and managed using Research Electronic Data Capture (REDCap) tools hosted at SUH.

  • Provenance and peer review Not commissioned; externally peer reviewed.